This dataset was downloaded on Kaggle and it contains football player data scraped from Transfermarkt, that is updated weekly. It includes information on over 60,000 games played across multiple seasons and major competitions, with details on over 400 clubs and 30,000 players.
The objective of this study, in the first instance is to make some statistical analyses to understand:
Who are the 20 players that scored more goals
Who are the 20 players that assisted more times
Who has more contribution to goals
Who are the 20 players that need less time to score a goal
Who are the 20 players that need less time to assist someone to score
Top 20 players that got more yellow cards
Top 20 players that got more red cards
Top 20 players with more card combination.
In a second instance I'm going to analyze Sporting Clube de Portugal. For some people that don't know much about this club, it's the club where golden ball winners Cristiano Ronaldo and Luis Figo started to play, it was also the first professional club for so many players as Nani, Quaresma, Joao Moutinho , Rafael Leao etc.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats as st
import seaborn as sns
import plotly.express as px
presencas = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/appearances.csv')
presencas.head()
| appearance_id | game_id | player_id | player_club_id | player_current_club_id | date | player_name | competition_id | yellow_cards | red_cards | goals | assists | minutes_played | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2231978_38004 | 2231978 | 38004 | 853 | 235 | 2012-07-03 | Aurélien Joachim | CLQ | 0 | 0 | 2 | 0 | 90 |
| 1 | 2233748_79232 | 2233748 | 79232 | 8841 | 2698 | 2012-07-05 | Ruslan Abyshov | ELQ | 0 | 0 | 0 | 0 | 90 |
| 2 | 2234413_42792 | 2234413 | 42792 | 6251 | 465 | 2012-07-05 | Sander Puri | ELQ | 0 | 0 | 0 | 0 | 45 |
| 3 | 2234418_73333 | 2234418 | 73333 | 1274 | 6646 | 2012-07-05 | Vegar Hedenstad | ELQ | 0 | 0 | 0 | 0 | 90 |
| 4 | 2234421_122011 | 2234421 | 122011 | 195 | 3008 | 2012-07-05 | Markus Henriksen | ELQ | 0 | 0 | 0 | 1 | 90 |
presencas['competition_id'].unique()
array(['CLQ', 'ELQ', 'UKRS', 'RUP', 'UKR1', 'DK1', 'RUSS', 'RU1', 'BESC',
'UKRP', 'BE1', 'FRCH', 'POCP', 'SC1', 'NLSC', 'CIT', 'FR1', 'NL1',
'SCI', 'POSU', 'DFL', 'GBCS', 'DFB', 'TR1', 'PO1', 'GB1', 'ES1',
'SUC', 'L1', 'GR1', 'IT1', 'CDR', 'DKP', 'USC', 'CL', 'EL', 'NLP',
'SFA', 'GRP', 'FAC', 'KLUB', 'ECLQ', 'CGB'], dtype=object)
presencas.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1524126 entries, 0 to 1524125 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 appearance_id 1524126 non-null object 1 game_id 1524126 non-null int64 2 player_id 1524126 non-null int64 3 player_club_id 1524126 non-null int64 4 player_current_club_id 1524126 non-null int64 5 date 1524126 non-null object 6 player_name 1524125 non-null object 7 competition_id 1524126 non-null object 8 yellow_cards 1524126 non-null int64 9 red_cards 1524126 non-null int64 10 goals 1524126 non-null int64 11 assists 1524126 non-null int64 12 minutes_played 1524126 non-null int64 dtypes: int64(9), object(4) memory usage: 500.6 MB
presencas.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1524126 entries, 0 to 1524125 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 appearance_id 1524126 non-null object 1 game_id 1524126 non-null int64 2 player_id 1524126 non-null int64 3 player_club_id 1524126 non-null int64 4 player_current_club_id 1524126 non-null int64 5 date 1524126 non-null object 6 player_name 1524125 non-null object 7 competition_id 1524126 non-null object 8 yellow_cards 1524126 non-null int64 9 red_cards 1524126 non-null int64 10 goals 1524126 non-null int64 11 assists 1524126 non-null int64 12 minutes_played 1524126 non-null int64 dtypes: int64(9), object(4) memory usage: 500.6 MB
clube_jogos = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/club_games.csv')
clube_jogos.head()
| game_id | club_id | own_goals | own_position | own_manager_name | opponent_id | opponent_goals | opponent_position | opponent_manager_name | hosting | is_win | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2320450 | 1468 | 0 | NaN | Holger Bachthaler | 24 | 2 | NaN | Armin Veh | Home | 0 |
| 1 | 2320460 | 1 | 3 | NaN | Jürgen Luginger | 86 | 1 | NaN | Robin Dutt | Home | 1 |
| 2 | 2320472 | 2036 | 4 | NaN | Frank Schmidt | 72 | 5 | NaN | Alexander Schmidt | Home | 0 |
| 3 | 2321044 | 16 | 2 | 1.0 | Jürgen Klopp | 23 | 1 | 15.0 | Torsten Lieberknecht | Home | 1 |
| 4 | 2321060 | 23 | 0 | 18.0 | Torsten Lieberknecht | 24 | 2 | 11.0 | Armin Veh | Home | 0 |
clube_jogos.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 132330 entries, 0 to 132329 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 game_id 132330 non-null int64 1 club_id 132330 non-null int64 2 own_goals 132330 non-null int64 3 own_position 92890 non-null float64 4 own_manager_name 130842 non-null object 5 opponent_id 132330 non-null int64 6 opponent_goals 132330 non-null int64 7 opponent_position 92890 non-null float64 8 opponent_manager_name 130842 non-null object 9 hosting 132330 non-null object 10 is_win 132330 non-null int64 dtypes: float64(2), int64(6), object(3) memory usage: 34.5 MB
clubes = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/clubs.csv')
clubes.head()
| club_id | club_code | name | domestic_competition_id | total_market_value | squad_size | average_age | foreigners_number | foreigners_percentage | national_team_players | stadium_name | stadium_seats | net_transfer_record | coach_name | last_season | filename | url | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 105 | sv-darmstadt-98 | Sportverein Darmstadt 1898 e. V. | L1 | NaN | 30 | 26.7 | 11 | 36.7 | 1 | Merck-Stadion am Böllenfalltor | 17810 | €-1.48m | NaN | 2023 | ../data/raw/transfermarkt-scraper/2023/clubs.j... | https://www.transfermarkt.co.uk/sv-darmstadt-9... |
| 1 | 11127 | ural-ekaterinburg | FK Ural Yekaterinburg | RU1 | NaN | 27 | 27.9 | 15 | 55.6 | 6 | Yekaterinburg Arena | 23000 | €-770k | NaN | 2023 | ../data/raw/transfermarkt-scraper/2023/clubs.j... | https://www.transfermarkt.co.uk/ural-ekaterinb... |
| 2 | 114 | besiktas-istanbul | Beşiktaş Jimnastik Kulübü | TR1 | NaN | 30 | 27.7 | 14 | 46.7 | 12 | Tüpraş Stadyumu | 42590 | €-14.50m | NaN | 2023 | ../data/raw/transfermarkt-scraper/2023/clubs.j... | https://www.transfermarkt.co.uk/besiktas-istan... |
| 3 | 12 | as-rom | Associazione Sportiva Roma | IT1 | NaN | 27 | 27.2 | 18 | 66.7 | 17 | Olimpico di Roma | 73261 | +€64.15m | NaN | 2023 | ../data/raw/transfermarkt-scraper/2023/clubs.j... | https://www.transfermarkt.co.uk/as-rom/startse... |
| 4 | 148 | tottenham-hotspur | Tottenham Hotspur Football Club | GB1 | NaN | 28 | 26.0 | 22 | 78.6 | 21 | Tottenham Hotspur Stadium | 62850 | €-154.90m | NaN | 2023 | ../data/raw/transfermarkt-scraper/2023/clubs.j... | https://www.transfermarkt.co.uk/tottenham-hots... |
clubes.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 426 entries, 0 to 425 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 club_id 426 non-null int64 1 club_code 426 non-null object 2 name 426 non-null object 3 domestic_competition_id 426 non-null object 4 total_market_value 0 non-null float64 5 squad_size 426 non-null int64 6 average_age 388 non-null float64 7 foreigners_number 426 non-null int64 8 foreigners_percentage 378 non-null float64 9 national_team_players 426 non-null int64 10 stadium_name 426 non-null object 11 stadium_seats 426 non-null int64 12 net_transfer_record 426 non-null object 13 coach_name 0 non-null float64 14 last_season 426 non-null int64 15 filename 426 non-null object 16 url 426 non-null object dtypes: float64(4), int64(6), object(7) memory usage: 289.5 KB
competicoes = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/competitions.csv')
competicoes.head()
| competition_id | competition_code | name | sub_type | type | country_id | country_name | domestic_league_code | confederation | url | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CIT | italy-cup | italy-cup | domestic_cup | domestic_cup | 75 | Italy | IT1 | europa | https://www.transfermarkt.co.uk/italy-cup/star... |
| 1 | NLSC | johan-cruijff-schaal | johan-cruijff-schaal | domestic_super_cup | other | 122 | Netherlands | NL1 | europa | https://www.transfermarkt.co.uk/johan-cruijff-... |
| 2 | GRP | kypello-elladas | kypello-elladas | domestic_cup | domestic_cup | 56 | Greece | GR1 | europa | https://www.transfermarkt.co.uk/kypello-ellada... |
| 3 | POSU | supertaca-candido-de-oliveira | supertaca-candido-de-oliveira | domestic_super_cup | other | 136 | Portugal | PO1 | europa | https://www.transfermarkt.co.uk/supertaca-cand... |
| 4 | RUSS | russian-super-cup | russian-super-cup | domestic_super_cup | other | 141 | Russia | RU1 | europa | https://www.transfermarkt.co.uk/russian-super-... |
competicoes.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 43 entries, 0 to 42 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 competition_id 43 non-null object 1 competition_code 43 non-null object 2 name 43 non-null object 3 sub_type 43 non-null object 4 type 43 non-null object 5 country_id 43 non-null int64 6 country_name 36 non-null object 7 domestic_league_code 36 non-null object 8 confederation 43 non-null object 9 url 43 non-null object dtypes: int64(1), object(9) memory usage: 27.8 KB
eventos = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/game_events.csv')
eventos.head()
| game_event_id | date | game_id | minute | type | club_id | player_id | description | player_in_id | player_assist_id | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2f41da30c471492e7d4a984951671677 | 2012-08-05 | 2211607 | 77 | Cards | 610 | 4425 | 1. Yellow card , Mass confrontation | NaN | NaN |
| 1 | a72f7186d132775f234d3e2f7bc0ed5b | 2012-08-05 | 2211607 | 77 | Cards | 383 | 33210 | 1. Yellow card , Mass confrontation | NaN | NaN |
| 2 | b2d721eaed4692a5c59a92323689ef18 | 2012-08-05 | 2211607 | 3 | Goals | 383 | 36500 | , Header, 1. Tournament Goal Assist: , Corner,... | NaN | 56416.0 |
| 3 | aef768899cedac0c9a650980219075a2 | 2012-08-05 | 2211607 | 53 | Goals | 383 | 36500 | , Right-footed shot, 2. Tournament Goal Assist... | NaN | 146258.0 |
| 4 | 5d6d9533023057b6619ecd145a038bbe | 2012-08-05 | 2211607 | 74 | Substitutions | 383 | 36500 | , Not reported | 49499.0 | NaN |
eventos.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 681448 entries, 0 to 681447 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 game_event_id 681448 non-null object 1 date 681448 non-null object 2 game_id 681448 non-null int64 3 minute 681448 non-null int64 4 type 681448 non-null object 5 club_id 681448 non-null int64 6 player_id 681448 non-null int64 7 description 344096 non-null object 8 player_in_id 429349 non-null float64 9 player_assist_id 32931 non-null float64 dtypes: float64(2), int64(4), object(4) memory usage: 214.9 MB
eventos.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 681448 entries, 0 to 681447 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 game_event_id 681448 non-null object 1 date 681448 non-null object 2 game_id 681448 non-null int64 3 minute 681448 non-null int64 4 type 681448 non-null object 5 club_id 681448 non-null int64 6 player_id 681448 non-null int64 7 description 344096 non-null object 8 player_in_id 429349 non-null float64 9 player_assist_id 32931 non-null float64 dtypes: float64(2), int64(4), object(4) memory usage: 214.9 MB
escalados = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/game_lineups.csv')
escalados.head()
| game_lineups_id | game_id | club_id | type | number | player_id | player_name | team_captain | position | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | b2dbe01c3656b06c8e23e9de714e26bb | 2317258 | 610 | substitutes | 5 | 1443 | Christian Poulsen | 0 | Defensive Midfield |
| 1 | b50a3ec6d52fd1490aab42042ac4f738 | 2317258 | 610 | starting_lineup | 4 | 5017 | Niklas Moisander | 0 | Centre-Back |
| 2 | 7d890e6d0ff8af84b065839966a0ec81 | 2317258 | 1090 | substitutes | 11 | 9602 | Maarten Martens | 0 | Left Winger |
| 3 | 8c355268678b9bbc7084221b1f0fde36 | 2317258 | 610 | starting_lineup | 17 | 12282 | Daley Blind | 0 | Left-Back |
| 4 | 76193074d549e5fdce4cdcbba0d66247 | 2317258 | 1090 | starting_lineup | 23 | 25427 | Roy Beerens | 0 | Right Winger |
escalados.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2303206 entries, 0 to 2303205 Data columns (total 9 columns): # Column Dtype --- ------ ----- 0 game_lineups_id object 1 game_id int64 2 club_id int64 3 type object 4 number object 5 player_id int64 6 player_name object 7 team_captain int64 8 position object dtypes: int64(4), object(5) memory usage: 867.3 MB
jogos = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/games.csv')
jogos.head()
| game_id | competition_id | season | round | date | home_club_id | away_club_id | home_club_goals | away_club_goals | home_club_position | ... | stadium | attendance | referee | url | home_club_formation | away_club_formation | home_club_name | away_club_name | aggregate | competition_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2321044 | L1 | 2013 | 2. Matchday | 2013-08-18 | 16 | 23 | 2 | 1 | 1.0 | ... | SIGNAL IDUNA PARK | 80200.0 | Peter Sippel | https://www.transfermarkt.co.uk/borussia-dortm... | 4-2-3-1 | 4-3-2-1 | Borussia Dortmund | Eintracht Braunschweig | 2:1 | domestic_league |
| 1 | 2321060 | L1 | 2013 | 3. Matchday | 2013-08-25 | 23 | 24 | 0 | 2 | 18.0 | ... | EINTRACHT-Stadion | 23325.0 | Wolfgang Stark | https://www.transfermarkt.co.uk/eintracht-brau... | 4-3-2-1 | 4-2-3-1 | Eintracht Braunschweig | Eintracht Frankfurt Fußball AG | 0:2 | domestic_league |
| 2 | 2321086 | L1 | 2013 | 6. Matchday | 2013-09-21 | 4 | 16 | 1 | 1 | 15.0 | ... | Max-Morlock-Stadion | 50000.0 | Knut Kircher | https://www.transfermarkt.co.uk/1-fc-nuremberg... | 4-2-3-1 | 4-2-3-1 | 1.FC Nuremberg | Borussia Dortmund | 1:1 | domestic_league |
| 3 | 2321152 | L1 | 2013 | 11. Matchday | 2013-11-02 | 44 | 33 | 0 | 2 | 7.0 | ... | Olympiastadion Berlin | 69277.0 | Günter Perl | https://www.transfermarkt.co.uk/hertha-bsc_fc-... | 4-2-3-1 | 4-2-3-1 | Hertha BSC | FC Schalke 04 | 0:2 | domestic_league |
| 4 | 2321205 | L1 | 2013 | 17. Matchday | 2013-12-21 | 41 | 39 | 2 | 3 | 14.0 | ... | Volksparkstadion | 50000.0 | Bastian Dankert | https://www.transfermarkt.co.uk/hamburger-sv_1... | 4-2-3-1 | 4-4-2 Diamond | Hamburger SV | 1. Fußball- und Sportverein Mainz 05 | 2:3 | domestic_league |
5 rows × 23 columns
jogos.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 66165 entries, 0 to 66164 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 game_id 66165 non-null int64 1 competition_id 66165 non-null object 2 season 66165 non-null int64 3 round 66165 non-null object 4 date 66165 non-null object 5 home_club_id 66165 non-null int64 6 away_club_id 66165 non-null int64 7 home_club_goals 66165 non-null int64 8 away_club_goals 66165 non-null int64 9 home_club_position 46445 non-null float64 10 away_club_position 46445 non-null float64 11 home_club_manager_name 65421 non-null object 12 away_club_manager_name 65421 non-null object 13 stadium 65952 non-null object 14 attendance 56502 non-null float64 15 referee 65570 non-null object 16 url 66165 non-null object 17 home_club_formation 59889 non-null object 18 away_club_formation 60037 non-null object 19 home_club_name 54787 non-null object 20 away_club_name 56022 non-null object 21 aggregate 66165 non-null object 22 competition_type 66165 non-null object dtypes: float64(3), int64(6), object(14) memory usage: 70.9 MB
valor = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/player_valuations.csv')
valor.head()
| player_id | date | market_value_in_eur | current_club_id | player_club_domestic_competition_id | |
|---|---|---|---|---|---|
| 0 | 405973 | 2000-01-20 | 150000 | 3057 | BE1 |
| 1 | 342216 | 2001-07-20 | 100000 | 1241 | SC1 |
| 2 | 3132 | 2003-12-09 | 400000 | 126 | TR1 |
| 3 | 6893 | 2003-12-15 | 900000 | 984 | GB1 |
| 4 | 10 | 2004-10-04 | 7000000 | 398 | IT1 |
valor.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 464801 entries, 0 to 464800 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 player_id 464801 non-null int64 1 date 464801 non-null object 2 market_value_in_eur 464801 non-null int64 3 current_club_id 464801 non-null int64 4 player_club_domestic_competition_id 464801 non-null object dtypes: int64(3), object(2) memory usage: 66.9 MB
jogadores = pd.read_csv('/Users/andrewferreira/Downloads/archive (1)/players.csv')
jogadores.head()
| player_id | first_name | last_name | name | last_season | current_club_id | player_code | country_of_birth | city_of_birth | country_of_citizenship | ... | foot | height_in_cm | contract_expiration_date | agent_name | image_url | url | current_club_domestic_competition_id | current_club_name | market_value_in_eur | highest_market_value_in_eur | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10 | Miroslav | Klose | Miroslav Klose | 2015 | 398 | miroslav-klose | Poland | Opole | Germany | ... | right | 184.0 | NaN | ASBW Sport Marketing | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/miroslav-klose... | IT1 | Società Sportiva Lazio S.p.A. | 1000000.0 | 30000000.0 |
| 1 | 26 | Roman | Weidenfeller | Roman Weidenfeller | 2017 | 16 | roman-weidenfeller | Germany | Diez | Germany | ... | left | 190.0 | NaN | Neubauer 13 GmbH | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/roman-weidenfe... | L1 | Borussia Dortmund | 750000.0 | 8000000.0 |
| 2 | 65 | Dimitar | Berbatov | Dimitar Berbatov | 2015 | 1091 | dimitar-berbatov | Bulgaria | Blagoevgrad | Bulgaria | ... | NaN | NaN | NaN | CSKA-AS-23 Ltd. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/dimitar-berbat... | GR1 | Panthessalonikios Athlitikos Omilos Konstantin... | 1000000.0 | 34500000.0 |
| 3 | 77 | NaN | Lúcio | Lúcio | 2012 | 506 | lucio | Brazil | Brasília | Brazil | ... | NaN | NaN | NaN | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/lucio/profil/s... | IT1 | Juventus Football Club | 200000.0 | 24500000.0 |
| 4 | 80 | Tom | Starke | Tom Starke | 2017 | 27 | tom-starke | East Germany (GDR) | Freital | Germany | ... | right | 194.0 | NaN | IFM | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/tom-starke/pro... | L1 | FC Bayern München | 100000.0 | 3000000.0 |
5 rows × 23 columns
jogadores.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 30383 entries, 0 to 30382 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 player_id 30383 non-null int64 1 first_name 28403 non-null object 2 last_name 30383 non-null object 3 name 30383 non-null object 4 last_season 30383 non-null int64 5 current_club_id 30383 non-null int64 6 player_code 30383 non-null object 7 country_of_birth 27697 non-null object 8 city_of_birth 28196 non-null object 9 country_of_citizenship 29829 non-null object 10 date_of_birth 30337 non-null object 11 sub_position 30209 non-null object 12 position 30383 non-null object 13 foot 28013 non-null object 14 height_in_cm 28306 non-null float64 15 contract_expiration_date 18924 non-null object 16 agent_name 15060 non-null object 17 image_url 30383 non-null object 18 url 30383 non-null object 19 current_club_domestic_competition_id 30383 non-null object 20 current_club_name 30383 non-null object 21 market_value_in_eur 29260 non-null float64 22 highest_market_value_in_eur 29260 non-null float64 dtypes: float64(3), int64(3), object(17) memory usage: 37.4 MB
After loading all libraries and data, we created the following dataframes:
- presencas - What are the games and competitions in which the players participated, as well as some statistics.
- clube_jogos - It contains the game identifier as well as the club and their opponent.
- clubes - Club identifier and the identifier of which the domentica competition.
- competicoes - Competition identifier.
- eventos - Contains the game and player identifier.
- escalados - Refers to the way the player interacted in this game. Whether it was a substitute used or a starter.
- jogos - It contains the game identifier, which club plays at home/away and the competition.
- valor - Market value of the player in euros.
- jogadores - List with players, what is their club, as well as other information.
Note that we can see there are a lot of missing values in some of our dataframes. I will check out some of the missing values and see if it's relevant to change those missing values, delete them or simply just keep them in our dataframe.
(Note that I used other names in my dataframes)
cels_ausentes = presencas[presencas['player_name'].isnull()]
cels_ausentes
| appearance_id | game_id | player_id | player_club_id | player_current_club_id | date | player_name | competition_id | yellow_cards | red_cards | goals | assists | minutes_played | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 797980 | 3084062_380365 | 3084062 | 380365 | 16486 | -1 | 2018-09-05 | NaN | CDR | 1 | 0 | 0 | 0 | 90 |
presencas = presencas.dropna()
presencas.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1524125 entries, 0 to 1524125 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 appearance_id 1524125 non-null object 1 game_id 1524125 non-null int64 2 player_id 1524125 non-null int64 3 player_club_id 1524125 non-null int64 4 player_current_club_id 1524125 non-null int64 5 date 1524125 non-null object 6 player_name 1524125 non-null object 7 competition_id 1524125 non-null object 8 yellow_cards 1524125 non-null int64 9 red_cards 1524125 non-null int64 10 goals 1524125 non-null int64 11 assists 1524125 non-null int64 12 minutes_played 1524125 non-null int64 dtypes: int64(9), object(4) memory usage: 162.8+ MB
cels_ausentes1 = clube_jogos[clube_jogos['own_position'].isnull()]
cels_ausentes1.head()
| game_id | club_id | own_goals | own_position | own_manager_name | opponent_id | opponent_goals | opponent_position | opponent_manager_name | hosting | is_win | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2320450 | 1468 | 0 | NaN | Holger Bachthaler | 24 | 2 | NaN | Armin Veh | Home | 0 |
| 1 | 2320460 | 1 | 3 | NaN | Jürgen Luginger | 86 | 1 | NaN | Robin Dutt | Home | 1 |
| 2 | 2320472 | 2036 | 4 | NaN | Frank Schmidt | 72 | 5 | NaN | Alexander Schmidt | Home | 0 |
| 13 | 2321976 | 10604 | 1 | NaN | Branko Nisevic | 448 | 4 | NaN | Marko Nikolic | Home | 0 |
| 14 | 2321990 | 1257 | 2 | NaN | Dan Theis | 2275 | 0 | NaN | Marko Rajamäki | Home | 1 |
cels_ausentes2 = clube_jogos[clube_jogos['own_manager_name'].isnull()]
cels_ausentes2.head()
| game_id | club_id | own_goals | own_position | own_manager_name | opponent_id | opponent_goals | opponent_position | opponent_manager_name | hosting | is_win | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 167 | 2366500 | 20417 | 5 | NaN | NaN | 22235 | 0 | NaN | NaN | Home | 1 |
| 169 | 2366512 | 24391 | 0 | NaN | NaN | 369 | 5 | NaN | NaN | Home | 0 |
| 194 | 2381474 | 3609 | 3 | NaN | NaN | 6672 | 0 | NaN | NaN | Home | 1 |
| 212 | 2422830 | 265 | 1 | 4.0 | NaN | 21957 | 0 | 16.0 | NaN | Home | 1 |
| 344 | 2481203 | 11259 | 10 | NaN | NaN | 18669 | 9 | NaN | NaN | Home | 1 |
cels_ausentes3 = clubes[clubes['average_age'].isnull()]
cels_ausentes3.head()
| club_id | club_code | name | domestic_competition_id | total_market_value | squad_size | average_age | foreigners_number | foreigners_percentage | national_team_players | stadium_name | stadium_seats | net_transfer_record | coach_name | last_season | filename | url | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 21957 | ao-platanias | AO Platanias | GR1 | NaN | 0 | NaN | 0 | NaN | 0 | Dimotiko Gipedo Perivolia | 3700 | +-0 | NaN | 2017 | ../data/raw/transfermarkt-scraper/2017/clubs.j... | https://www.transfermarkt.co.uk/ao-platanias/s... |
| 25 | 6996 | goverla-uzhgorod | Goverla Uzhgorod (- 2016) | UKR1 | NaN | 0 | NaN | 0 | NaN | 0 | Avangard | 10640 | +-0 | NaN | 2015 | ../data/raw/transfermarkt-scraper/2015/clubs.j... | https://www.transfermarkt.co.uk/goverla-uzhgor... |
| 38 | 2477 | karpaty-lviv | Karpaty Lviv (-2021) | UKR1 | NaN | 0 | NaN | 0 | NaN | 0 | Stadion Ukraina | 28051 | +-0 | NaN | 2019 | ../data/raw/transfermarkt-scraper/2019/clubs.j... | https://www.transfermarkt.co.uk/karpaty-lviv/s... |
| 53 | 566 | beerschot-ac | Beerschot AC | BE1 | NaN | 0 | NaN | 0 | NaN | 0 | Olympisch Stadion | 12771 | +-0 | NaN | 2012 | ../data/raw/transfermarkt-scraper/2012/clubs.j... | https://www.transfermarkt.co.uk/beerschot-ac/s... |
| 68 | 1411 | raec-mons | RAEC Mons (- 2015) | BE1 | NaN | 0 | NaN | 0 | NaN | 0 | Stade Charles Tondreau | 12662 | +-0 | NaN | 2013 | ../data/raw/transfermarkt-scraper/2013/clubs.j... | https://www.transfermarkt.co.uk/raec-mons/star... |
cels_ausentes4 = competicoes[competicoes['country_name'].isnull()]
cels_ausentes4.head()
| competition_id | competition_code | name | sub_type | type | country_id | country_name | domestic_league_code | confederation | url | |
|---|---|---|---|---|---|---|---|---|---|---|
| 6 | USC | uefa-super-cup | uefa-super-cup | uefa_super_cup | other | -1 | NaN | NaN | europa | https://www.transfermarkt.co.uk/uefa-super-cup... |
| 8 | EL | europa-league | europa-league | europa_league | international_cup | -1 | NaN | NaN | europa | https://www.transfermarkt.co.uk/europa-league/... |
| 17 | ELQ | europa-league-qualifikation | europa-league-qualifikation | europa_league_qualifying | international_cup | -1 | NaN | NaN | europa | https://www.transfermarkt.co.uk/europa-league-... |
| 20 | ECLQ | uefa-europa-conference-league-qualifikation | uefa-europa-conference-league-qualifikation | uefa_europa_conference_league_qualifiers | international_cup | -1 | NaN | NaN | europa | https://www.transfermarkt.co.uk/uefa-europa-co... |
| 26 | CL | uefa-champions-league | uefa-champions-league | uefa_champions_league | international_cup | -1 | NaN | NaN | europa | https://www.transfermarkt.co.uk/uefa-champions... |
As mentioned above, there are several missing data in many dataframes.
After printing some of the plots that contain missing data, I drew some conclusions:
There are absent values in country_name that are part of international competitions, so no country exists.
average_age contains absent data because they are part of teams that are not so recognised.
position can contain absent values by precisely the teams inserted also don't make part of top scale teams.
I only removed one missing item on the dataframe I want to study 'presencas'. After reflection, I do not think it is necessary to make any changes to the missing data, at least at this time. The main dataframe that I intend to study contains all the data.
If necessary, I will replace or remove missing data.
I will change to date format some of the types of data that refer precisely to the date.
presencas['date'] = pd.to_datetime(presencas['date'], format="%Y/%m/%d")
valor['date'] = pd.to_datetime(valor['date'], format="%Y/%m/%d")
jogadores['date'] = pd.to_datetime(eventos['date'], format="%Y/%m/%d")
eventos['date'] = pd.to_datetime(eventos['date'], format="%Y/%m/%d")
As the dataframes we load require us to use a lot of memory from our system, we will change some of the types of data so that it optimizes the memory usage considerably to the system.
presencas['player_name'] = presencas['player_name'].astype('category')
presencas['competition_id'] = presencas['competition_id'].astype('category')
clube_jogos['own_manager_name'] = clube_jogos['own_manager_name'].astype('category')
clube_jogos['opponent_manager_name'] = clube_jogos['opponent_manager_name'].astype('category')
clube_jogos['hosting'] = clube_jogos['hosting'].astype('category')
clubes['club_code'] = clubes['club_code'].astype('category')
clubes['name'] = clubes['name'].astype('category')
eventos['type'] = eventos['type'].astype('category')
eventos['description'] = eventos['description'].astype('category')
escalados['type'] = escalados['type'].astype('category')
escalados['player_name'] = escalados['player_name'].astype('category')
escalados['position'] = escalados['position'].astype('category')
jogos['competition_id'] = jogos['competition_id'].astype('category')
jogos['round'] = jogos['round'].astype('category')
jogos['referee'] = jogos['referee'].astype('category')
jogos['home_club_name'] = jogos['home_club_name'].astype('category')
jogos['away_club_name'] = jogos['away_club_name'].astype('category')
jogos['competition_type'] = jogos['competition_type'].astype('category')
valor['player_club_domestic_competition_id'] = valor['player_club_domestic_competition_id'].astype('category')
jogadores['first_name'] = jogadores['first_name'].astype('category')
jogadores['last_name'] = jogadores['last_name'].astype('category')
jogadores['name'] = jogadores['name'].astype('category')
jogadores['player_code'] = jogadores['player_code'].astype('category')
jogadores['country_of_birth'] = jogadores['country_of_birth'].astype('category')
jogadores['city_of_birth'] = jogadores['city_of_birth'].astype('category')
jogadores['country_of_citizenship'] = jogadores['country_of_citizenship'].astype('category')
jogadores['agent_name'] = jogadores['agent_name'].astype('category')
jogadores['current_club_domestic_competition_id'] = jogadores['current_club_domestic_competition_id'].astype('category')
jogadores['current_club_name'] = jogadores['current_club_name'].astype('category')
jogadores.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 30383 entries, 0 to 30382 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 player_id 30383 non-null int64 1 first_name 28403 non-null category 2 last_name 30383 non-null category 3 name 30383 non-null category 4 last_season 30383 non-null int64 5 current_club_id 30383 non-null int64 6 player_code 30383 non-null category 7 country_of_birth 27697 non-null category 8 city_of_birth 28196 non-null category 9 country_of_citizenship 29829 non-null category 10 date_of_birth 30337 non-null object 11 sub_position 30209 non-null object 12 position 30383 non-null object 13 foot 28013 non-null object 14 height_in_cm 28306 non-null float64 15 contract_expiration_date 18924 non-null object 16 agent_name 15060 non-null category 17 image_url 30383 non-null object 18 url 30383 non-null object 19 current_club_domestic_competition_id 30383 non-null category 20 current_club_name 30383 non-null category 21 market_value_in_eur 29260 non-null float64 22 highest_market_value_in_eur 29260 non-null float64 23 date 30383 non-null datetime64[ns] dtypes: category(10), datetime64[ns](1), float64(3), int64(3), object(7) memory usage: 28.9 MB
Just an example:
After some changes made to the various dataframes, the time has come to start analysing the data.
In the first instance I will print some relevant information and then groupby players with goals, assists, how many minutes they played and cards.
presencas.head()
| appearance_id | game_id | player_id | player_club_id | player_current_club_id | date | player_name | competition_id | yellow_cards | red_cards | goals | assists | minutes_played | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2231978_38004 | 2231978 | 38004 | 853 | 235 | 2012-07-03 | Aurélien Joachim | CLQ | 0 | 0 | 2 | 0 | 90 |
| 1 | 2233748_79232 | 2233748 | 79232 | 8841 | 2698 | 2012-07-05 | Ruslan Abyshov | ELQ | 0 | 0 | 0 | 0 | 90 |
| 2 | 2234413_42792 | 2234413 | 42792 | 6251 | 465 | 2012-07-05 | Sander Puri | ELQ | 0 | 0 | 0 | 0 | 45 |
| 3 | 2234418_73333 | 2234418 | 73333 | 1274 | 6646 | 2012-07-05 | Vegar Hedenstad | ELQ | 0 | 0 | 0 | 0 | 90 |
| 4 | 2234421_122011 | 2234421 | 122011 | 195 | 3008 | 2012-07-05 | Markus Henriksen | ELQ | 0 | 0 | 0 | 1 | 90 |
presencas.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1524125 entries, 0 to 1524125 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 appearance_id 1524125 non-null object 1 game_id 1524125 non-null int64 2 player_id 1524125 non-null int64 3 player_club_id 1524125 non-null int64 4 player_current_club_id 1524125 non-null int64 5 date 1524125 non-null datetime64[ns] 6 player_name 1524125 non-null category 7 competition_id 1524125 non-null category 8 yellow_cards 1524125 non-null int64 9 red_cards 1524125 non-null int64 10 goals 1524125 non-null int64 11 assists 1524125 non-null int64 12 minutes_played 1524125 non-null int64 dtypes: category(2), datetime64[ns](1), int64(9), object(1) memory usage: 144.6+ MB
presencas.shape
(1524125, 13)
#checking that we don't have any missing values
presencas.isnull().sum()
appearance_id 0 game_id 0 player_id 0 player_club_id 0 player_current_club_id 0 date 0 player_name 0 competition_id 0 yellow_cards 0 red_cards 0 goals 0 assists 0 minutes_played 0 dtype: int64
presencas.describe()
| game_id | player_id | player_club_id | player_current_club_id | yellow_cards | red_cards | goals | assists | minutes_played | |
|---|---|---|---|---|---|---|---|---|---|
| count | 1.524125e+06 | 1.524125e+06 | 1.524125e+06 | 1.524125e+06 | 1.524125e+06 | 1.524125e+06 | 1.524125e+06 | 1.524125e+06 | 1.524125e+06 |
| mean | 3.033598e+06 | 1.833277e+05 | 2.977173e+03 | 3.619215e+03 | 1.487247e-01 | 3.815960e-03 | 9.630116e-02 | 7.590322e-02 | 6.959456e+01 |
| std | 5.694835e+05 | 1.682505e+05 | 7.550138e+03 | 9.226995e+03 | 3.671985e-01 | 6.165550e-02 | 3.317028e-01 | 2.864926e-01 | 2.978741e+01 |
| min | 2.211607e+06 | 1.000000e+01 | 1.000000e+00 | 3.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 |
| 25% | 2.532476e+06 | 5.492800e+04 | 2.810000e+02 | 3.360000e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 5.100000e+01 |
| 50% | 2.942764e+06 | 1.291290e+05 | 8.330000e+02 | 9.310000e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 9.000000e+01 |
| 75% | 3.461749e+06 | 2.651530e+05 | 2.441000e+03 | 2.687000e+03 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 9.000000e+01 |
| max | 4.271739e+06 | 1.186012e+06 | 1.016340e+05 | 8.367800e+04 | 2.000000e+00 | 1.000000e+00 | 6.000000e+00 | 6.000000e+00 | 1.350000e+02 |
presencas.nunique()
appearance_id 1524125 game_id 60114 player_id 23858 player_club_id 1022 player_current_club_id 424 date 3380 player_name 23363 competition_id 43 yellow_cards 3 red_cards 2 goals 7 assists 7 minutes_played 121 dtype: int64
presencas["date"].max()
Timestamp('2024-01-22 00:00:00')
presencas["date"].min()
Timestamp('2012-07-03 00:00:00')
So we can see that our dataframe contains info from almost 12 years
presencas_novo = presencas.groupby(['player_name'])[['goals', 'assists', 'yellow_cards', 'red_cards', 'minutes_played']].agg(['sum'])
presencas_novo.reset_index(inplace=True)
presencas_novo.head()
| player_name | goals | assists | yellow_cards | red_cards | minutes_played | |
|---|---|---|---|---|---|---|
| sum | sum | sum | sum | sum | ||
| 0 | A.J. Soares | 1 | 0 | 2 | 0 | 421 |
| 1 | AJ Leitch-Smith | 7 | 6 | 1 | 0 | 1700 |
| 2 | Aaron Appindangoyé | 1 | 2 | 12 | 0 | 8084 |
| 3 | Aaron Bastiaans | 2 | 0 | 0 | 0 | 144 |
| 4 | Aaron Boupendza | 22 | 1 | 7 | 0 | 2586 |
#change column names in our new dataframe
presencas_novo.columns = ['player_name', 'total_goals', 'total_assists', 'total_yellow_cards', 'total_red_cards', 'total_minutes_played']
presencas_novo.head()
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | |
|---|---|---|---|---|---|---|
| 0 | A.J. Soares | 1 | 0 | 2 | 0 | 421 |
| 1 | AJ Leitch-Smith | 7 | 6 | 1 | 0 | 1700 |
| 2 | Aaron Appindangoyé | 1 | 2 | 12 | 0 | 8084 |
| 3 | Aaron Bastiaans | 2 | 0 | 0 | 0 | 144 |
| 4 | Aaron Boupendza | 22 | 1 | 7 | 0 | 2586 |
presencas_novo.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 23363 entries, 0 to 23362 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 player_name 23363 non-null category 1 total_goals 23363 non-null int64 2 total_assists 23363 non-null int64 3 total_yellow_cards 23363 non-null int64 4 total_red_cards 23363 non-null int64 5 total_minutes_played 23363 non-null int64 dtypes: category(1), int64(5) memory usage: 1.6 MB
Let's check out who were the 20 top goalscorers
golos_20 = presencas_novo.sort_values(by="total_goals", ascending=False).head(20)
golos_20
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | |
|---|---|---|---|---|---|---|
| 18599 | Robert Lewandowski | 453 | 111 | 53 | 1 | 46014 |
| 12804 | Lionel Messi | 443 | 219 | 49 | 1 | 43543 |
| 4264 | Cristiano Ronaldo | 432 | 114 | 64 | 4 | 41150 |
| 13187 | Luis Suárez | 308 | 157 | 114 | 1 | 42668 |
| 8635 | Harry Kane | 294 | 69 | 45 | 0 | 34993 |
| 17896 | Pierre-Emerick Aubameyang | 277 | 76 | 33 | 1 | 36593 |
| 15612 | Mohamed Salah | 254 | 117 | 19 | 0 | 36834 |
| 18891 | Romelu Lukaku | 250 | 75 | 43 | 1 | 36808 |
| 4030 | Ciro Immobile | 250 | 66 | 71 | 2 | 33400 |
| 11614 | Karim Benzema | 249 | 121 | 7 | 0 | 34639 |
| 5998 | Edinson Cavani | 232 | 47 | 48 | 0 | 27906 |
| 22514 | Wissam Ben Yedder | 228 | 70 | 26 | 0 | 31960 |
| 12318 | Kylian Mbappé | 228 | 97 | 40 | 1 | 23155 |
| 19867 | Sergio Agüero | 220 | 57 | 27 | 1 | 22962 |
| 1907 | Antoine Griezmann | 213 | 99 | 58 | 2 | 38927 |
| 16180 | Neymar | 212 | 143 | 93 | 2 | 28364 |
| 995 | Alexandre Lacazette | 209 | 70 | 51 | 2 | 33513 |
| 5994 | Edin Dzeko | 207 | 100 | 60 | 2 | 34558 |
| 14782 | Mauro Icardi | 203 | 55 | 21 | 0 | 27983 |
| 23116 | Zlatan Ibrahimović | 196 | 78 | 42 | 4 | 21715 |
# Plotting the bar chart
sorted_data = golos_20.sort_values(by='total_goals', ascending=True)
plt.barh(sorted_data['player_name'], sorted_data['total_goals'], color='blue')
plt.xlabel('Number of Goals')
plt.ylabel('Players')
plt.title('Top 20 Goal scorers')
plt.xticks(rotation=45)
plt.show()
Let's check who's part of the 20 biggest assisters
assistencias_20 = presencas_novo.sort_values(by="total_assists", ascending=False).head(20)
assistencias_20
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | |
|---|---|---|---|---|---|---|
| 12804 | Lionel Messi | 443 | 219 | 49 | 1 | 43543 |
| 11840 | Kevin De Bruyne | 117 | 195 | 48 | 0 | 35054 |
| 21105 | Thomas Müller | 166 | 190 | 26 | 1 | 34384 |
| 5883 | Dusan Tadic | 171 | 188 | 47 | 0 | 40523 |
| 23237 | Ángel Di María | 117 | 163 | 35 | 4 | 31286 |
| 13187 | Luis Suárez | 308 | 157 | 114 | 1 | 42668 |
| 16180 | Neymar | 212 | 143 | 93 | 2 | 28364 |
| 8549 | Hakim Ziyech | 114 | 135 | 54 | 0 | 29480 |
| 3847 | Christian Eriksen | 92 | 131 | 28 | 0 | 35309 |
| 5434 | Dimitri Payet | 96 | 128 | 55 | 3 | 29707 |
| 13935 | Marco Reus | 167 | 124 | 40 | 1 | 30913 |
| 18121 | Raheem Sterling | 160 | 122 | 60 | 0 | 36463 |
| 11614 | Karim Benzema | 249 | 121 | 7 | 0 | 34639 |
| 4966 | David Silva | 70 | 120 | 48 | 1 | 31617 |
| 5842 | Dries Mertens | 177 | 119 | 58 | 1 | 30477 |
| 8742 | Henrikh Mkhitaryan | 128 | 119 | 58 | 0 | 35698 |
| 20613 | Steven Berghuis | 133 | 117 | 60 | 2 | 29578 |
| 15612 | Mohamed Salah | 254 | 117 | 19 | 0 | 36834 |
| 3242 | Bruno Fernandes | 136 | 116 | 100 | 0 | 35238 |
| 17554 | Paulinho | 166 | 115 | 180 | 4 | 71566 |
# Plotting the bar chart
sorted_data_assists = assistencias_20.sort_values(by='total_assists', ascending=True)
plt.barh(sorted_data_assists['player_name'], sorted_data_assists['total_assists'], color='blue')
plt.xlabel('Number of Assists')
plt.ylabel('Players')
plt.title('Top 20 Goal Assisters')
plt.xticks(rotation=45)
plt.show()
presencas_novo["total_goals_assists"] = presencas_novo["total_goals"] + presencas_novo["total_assists"]
presencas_novo_goals_assists = presencas_novo.sort_values(by="total_goals_assists", ascending=False).head(20)
presencas_novo_goals_assists['total_goals_assists'] = presencas_novo_goals_assists['total_goals_assists'].astype(str)
presencas_novo_goals_assists = presencas_novo_goals_assists.sort_values(by="total_goals_assists", ascending=True)
most_20_total_goals_assists = px.bar(presencas_novo_goals_assists, x=["total_goals", "total_assists"], y='player_name', orientation='h', title="Top 20 Players by Total Goal Contribution (Goals + Assists)" , text="total_goals_assists")
most_20_total_goals_assists.show()
most_20_total_goals_assists.write_image("plot.png", engine="kaleido")
presencas_novo_goals_assists.head()
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | total_goals_assists | |
|---|---|---|---|---|---|---|---|
| 13935 | Marco Reus | 167 | 124 | 40 | 1 | 30913 | 291 |
| 5842 | Dries Mertens | 177 | 119 | 58 | 1 | 30477 | 296 |
| 22514 | Wissam Ben Yedder | 228 | 70 | 26 | 0 | 31960 | 298 |
| 5994 | Edin Dzeko | 207 | 100 | 60 | 2 | 34558 | 307 |
| 11840 | Kevin De Bruyne | 117 | 195 | 48 | 0 | 35054 | 312 |
presencas_novo_goals_assists['goal_per_minute'] = presencas_novo_goals_assists['total_minutes_played'] / presencas_novo_goals_assists['total_goals']
presencas_novo_goals_assists['goal_per_minute'] = presencas_novo_goals_assists['goal_per_minute'].round(2)
goals_minute = presencas_novo_goals_assists.sort_values(by="goal_per_minute", ascending=True).head(20)
goals_minute
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | total_goals_assists | goal_per_minute | |
|---|---|---|---|---|---|---|---|---|
| 4264 | Cristiano Ronaldo | 432 | 114 | 64 | 4 | 41150 | 546 | 95.25 |
| 12804 | Lionel Messi | 443 | 219 | 49 | 1 | 43543 | 662 | 98.29 |
| 12318 | Kylian Mbappé | 228 | 97 | 40 | 1 | 23155 | 325 | 101.56 |
| 18599 | Robert Lewandowski | 453 | 111 | 53 | 1 | 46014 | 564 | 101.58 |
| 8635 | Harry Kane | 294 | 69 | 45 | 0 | 34993 | 363 | 119.02 |
| 17896 | Pierre-Emerick Aubameyang | 277 | 76 | 33 | 1 | 36593 | 353 | 132.10 |
| 4030 | Ciro Immobile | 250 | 66 | 71 | 2 | 33400 | 316 | 133.60 |
| 16180 | Neymar | 212 | 143 | 93 | 2 | 28364 | 355 | 133.79 |
| 13187 | Luis Suárez | 308 | 157 | 114 | 1 | 42668 | 465 | 138.53 |
| 11614 | Karim Benzema | 249 | 121 | 7 | 0 | 34639 | 370 | 139.11 |
| 22514 | Wissam Ben Yedder | 228 | 70 | 26 | 0 | 31960 | 298 | 140.18 |
| 15612 | Mohamed Salah | 254 | 117 | 19 | 0 | 36834 | 371 | 145.02 |
| 18891 | Romelu Lukaku | 250 | 75 | 43 | 1 | 36808 | 325 | 147.23 |
| 5994 | Edin Dzeko | 207 | 100 | 60 | 2 | 34558 | 307 | 166.95 |
| 5842 | Dries Mertens | 177 | 119 | 58 | 1 | 30477 | 296 | 172.19 |
| 1907 | Antoine Griezmann | 213 | 99 | 58 | 2 | 38927 | 312 | 182.76 |
| 13935 | Marco Reus | 167 | 124 | 40 | 1 | 30913 | 291 | 185.11 |
| 21105 | Thomas Müller | 166 | 190 | 26 | 1 | 34384 | 356 | 207.13 |
| 5883 | Dusan Tadic | 171 | 188 | 47 | 0 | 40523 | 359 | 236.98 |
| 11840 | Kevin De Bruyne | 117 | 195 | 48 | 0 | 35054 | 312 | 299.61 |
# Plotting the bar chart
sorted_data_ga = goals_minute.sort_values(by='goal_per_minute', ascending = False)
plt.barh(sorted_data_ga['player_name'], sorted_data_ga['goal_per_minute'], color='blue')
plt.xlabel('Goal per minute')
plt.ylabel('Players')
plt.title('Top 20 Players by goal per minute')
plt.xticks(rotation=45)
plt.show()
presencas_novo_goals_assists['assist_per_minute'] = presencas_novo_goals_assists['total_minutes_played'] / presencas_novo_goals_assists['total_assists']
presencas_novo_goals_assists['assist_per_minute'] = presencas_novo_goals_assists['assist_per_minute'].round(2)
assist_minute = presencas_novo_goals_assists.sort_values(by="assist_per_minute", ascending=True).head(20)
assist_minute
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | total_goals_assists | goal_per_minute | assist_per_minute | |
|---|---|---|---|---|---|---|---|---|---|
| 11840 | Kevin De Bruyne | 117 | 195 | 48 | 0 | 35054 | 312 | 299.61 | 179.76 |
| 21105 | Thomas Müller | 166 | 190 | 26 | 1 | 34384 | 356 | 207.13 | 180.97 |
| 16180 | Neymar | 212 | 143 | 93 | 2 | 28364 | 355 | 133.79 | 198.35 |
| 12804 | Lionel Messi | 443 | 219 | 49 | 1 | 43543 | 662 | 98.29 | 198.83 |
| 5883 | Dusan Tadic | 171 | 188 | 47 | 0 | 40523 | 359 | 236.98 | 215.55 |
| 12318 | Kylian Mbappé | 228 | 97 | 40 | 1 | 23155 | 325 | 101.56 | 238.71 |
| 13935 | Marco Reus | 167 | 124 | 40 | 1 | 30913 | 291 | 185.11 | 249.30 |
| 5842 | Dries Mertens | 177 | 119 | 58 | 1 | 30477 | 296 | 172.19 | 256.11 |
| 13187 | Luis Suárez | 308 | 157 | 114 | 1 | 42668 | 465 | 138.53 | 271.77 |
| 11614 | Karim Benzema | 249 | 121 | 7 | 0 | 34639 | 370 | 139.11 | 286.27 |
| 15612 | Mohamed Salah | 254 | 117 | 19 | 0 | 36834 | 371 | 145.02 | 314.82 |
| 5994 | Edin Dzeko | 207 | 100 | 60 | 2 | 34558 | 307 | 166.95 | 345.58 |
| 4264 | Cristiano Ronaldo | 432 | 114 | 64 | 4 | 41150 | 546 | 95.25 | 360.96 |
| 1907 | Antoine Griezmann | 213 | 99 | 58 | 2 | 38927 | 312 | 182.76 | 393.20 |
| 18599 | Robert Lewandowski | 453 | 111 | 53 | 1 | 46014 | 564 | 101.58 | 414.54 |
| 22514 | Wissam Ben Yedder | 228 | 70 | 26 | 0 | 31960 | 298 | 140.18 | 456.57 |
| 17896 | Pierre-Emerick Aubameyang | 277 | 76 | 33 | 1 | 36593 | 353 | 132.10 | 481.49 |
| 18891 | Romelu Lukaku | 250 | 75 | 43 | 1 | 36808 | 325 | 147.23 | 490.77 |
| 4030 | Ciro Immobile | 250 | 66 | 71 | 2 | 33400 | 316 | 133.60 | 506.06 |
| 8635 | Harry Kane | 294 | 69 | 45 | 0 | 34993 | 363 | 119.02 | 507.14 |
# Plotting the bar chart
sorted_data_a = assist_minute.sort_values(by='assist_per_minute', ascending = False)
plt.barh(sorted_data_a['player_name'], sorted_data_a['assist_per_minute'], color='blue')
plt.xlabel('Assist per minute')
plt.ylabel('Players')
plt.title('Top 20 Players by assist per minute')
plt.xticks(rotation=45)
plt.show()
presencas_novo_goals_assists["total_goals_assists_per_minute"] = presencas_novo_goals_assists["goal_per_minute"] + presencas_novo_goals_assists["assist_per_minute"]
presencas_novo_goals_assists = presencas_novo_goals_assists.sort_values(by="total_goals_assists_per_minute", ascending=False).head(20)
presencas_novo_goals_assists['total_goals_assists_per_minute'] = presencas_novo_goals_assists['total_goals_assists_per_minute'].astype(str)
presencas_novo_goals_assists = presencas_novo_goals_assists.sort_values(by="total_goals_assists_per_minute", ascending=False)
most_20_total_goals_assists_per_minute = px.bar(presencas_novo_goals_assists, x=["goal_per_minute", "assist_per_minute"], y='player_name', orientation='h', title="Top 20 Players by Total Goal Contribution per minute (Goals + Assists)" , text="total_goals_assists_per_minute")
most_20_total_goals_assists_per_minute.show()
#check which players are in the list of best goalscorers and assisters
merged_df = pd.merge(golos_20, assistencias_20, on='player_name', how='inner')
players_in_both = merged_df['player_name'].tolist()
players_in_both
['Lionel Messi', 'Luis Suárez', 'Mohamed Salah', 'Karim Benzema', 'Neymar']
After some studies and reading graphs we conclude that:
The 3 best scorers are Robert Lewandowski, Lionel Messi and Cristiano Ronaldo.
Of the twenty best goalscorers, all play forward.
Only Messi, Suarez, Salah, Benzema and Neymar are both on the list of best scorer and assistant.
Messi is the player who contributes the most to the goal (662 actions for a goal)
Cristiano Ronaldo is the player who needs the least time to score a goal (95.25 minutes to score a goal. Almost 1 goal per game)
Kevin De Bruyne needs only 179.76 minutes to make an assistance.
Messi is the player who needs the least time to contribute to the goal.
Despite being the player who needs the fewest minutes to score, Ronaldo needs an average of 456.21 minutes to contribute to the goal.
amarelos_20 = presencas_novo.sort_values(by="total_yellow_cards", ascending=False).head(20)
amarelos_20
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | total_goals_assists | |
|---|---|---|---|---|---|---|---|
| 4655 | Danilo | 106 | 70 | 201 | 4 | 80523 | 176 |
| 17554 | Paulinho | 166 | 115 | 180 | 4 | 71566 | 281 |
| 13851 | Marcelo | 52 | 72 | 172 | 8 | 76472 | 124 |
| 7058 | Fernando | 58 | 43 | 168 | 3 | 50717 | 101 |
| 19906 | Sergio Ramos | 70 | 19 | 140 | 6 | 36455 | 89 |
| 19877 | Sergio Busquets | 11 | 33 | 136 | 1 | 41597 | 44 |
| 6989 | Felipe | 20 | 9 | 133 | 7 | 38048 | 29 |
| 13957 | Marco Verratti | 10 | 51 | 128 | 1 | 26861 | 61 |
| 4411 | Damián Suárez | 6 | 22 | 123 | 3 | 26048 | 28 |
| 8299 | Granit Xhaka | 30 | 38 | 122 | 5 | 36790 | 68 |
| 23210 | Álvaro González | 10 | 14 | 121 | 3 | 31011 | 24 |
| 4505 | Daniel Carvajal | 11 | 66 | 121 | 1 | 34236 | 77 |
| 18261 | Raúl García | 112 | 48 | 118 | 1 | 28529 | 160 |
| 4464 | Dani Parejo | 74 | 89 | 118 | 2 | 42266 | 163 |
| 14274 | Marten de Roon | 29 | 24 | 117 | 1 | 38109 | 53 |
| 19034 | Rubén Pérez | 2 | 7 | 117 | 2 | 25450 | 9 |
| 11560 | Kalidou Koulibaly | 19 | 13 | 116 | 4 | 36578 | 32 |
| 22307 | Víctor Sánchez | 11 | 21 | 115 | 1 | 19852 | 32 |
| 13187 | Luis Suárez | 308 | 157 | 114 | 1 | 42668 | 465 |
| 8379 | Guilherme | 41 | 51 | 111 | 2 | 60474 | 92 |
# Plotting the bar chart
sorted_data_yellow = amarelos_20.sort_values(by='total_yellow_cards', ascending=True)
plt.barh(sorted_data_yellow['player_name'], sorted_data_yellow['total_yellow_cards'], color='yellow')
plt.xlabel('Number of Yellow Cards')
plt.ylabel('Players')
plt.title('Top 20 Players with most Yellow Cards')
plt.xticks(rotation=45)
plt.show()
vermelhos_20 = presencas_novo.sort_values(by="total_red_cards", ascending=False).head(20)
vermelhos_20
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | total_goals_assists | |
|---|---|---|---|---|---|---|---|
| 13851 | Marcelo | 52 | 72 | 172 | 8 | 76472 | 124 |
| 6989 | Felipe | 20 | 9 | 133 | 7 | 38048 | 29 |
| 6876 | Fallou Diagne | 7 | 2 | 35 | 7 | 11718 | 9 |
| 22864 | Younès Belhanda | 62 | 58 | 77 | 6 | 24483 | 120 |
| 8808 | Hilton | 9 | 6 | 32 | 6 | 25446 | 15 |
| 14044 | Mario Balotelli | 113 | 20 | 90 | 6 | 17839 | 133 |
| 4508 | Daniel Congré | 10 | 7 | 32 | 6 | 24625 | 17 |
| 21607 | Téji Savanier | 40 | 36 | 34 | 6 | 13132 | 76 |
| 6949 | Federico Fazio | 24 | 6 | 55 | 6 | 26739 | 30 |
| 18110 | Rafik Halliche | 2 | 2 | 22 | 6 | 6665 | 4 |
| 19906 | Sergio Ramos | 70 | 19 | 140 | 6 | 36455 | 89 |
| 5753 | Domenico Berardi | 131 | 88 | 88 | 6 | 26263 | 219 |
| 10498 | Joey van den Berg | 11 | 13 | 35 | 5 | 9811 | 24 |
| 952 | Alexander Djiku | 6 | 3 | 62 | 5 | 20419 | 9 |
| 20476 | Stefan Mitrovic | 13 | 4 | 64 | 5 | 24967 | 17 |
| 21094 | Thomas Lam | 19 | 8 | 44 | 5 | 15533 | 27 |
| 21891 | Vedad Ibisevic | 94 | 31 | 54 | 5 | 18445 | 125 |
| 2925 | Birger Verstraete | 12 | 8 | 55 | 5 | 15031 | 20 |
| 17404 | Papy Djilobodji | 21 | 7 | 69 | 5 | 24267 | 28 |
| 11197 | Joãozinho | 50 | 86 | 71 | 5 | 39669 | 136 |
# Plotting the bar chart
sorted_data_red = vermelhos_20.sort_values(by='total_red_cards', ascending=True)
plt.barh(sorted_data_red['player_name'], sorted_data_red['total_red_cards'], color='red')
plt.xlabel('Number of Red Cards')
plt.ylabel('Players')
plt.title('Top 20 Players with most Red Cards')
plt.xticks(rotation=45)
plt.show()
presencas_novo['total_cards'] = presencas_novo['total_yellow_cards'] + presencas_novo['total_red_cards']
total_20 = presencas_novo.sort_values(by="total_cards", ascending=False).head(20)
total_20
| player_name | total_goals | total_assists | total_yellow_cards | total_red_cards | total_minutes_played | total_goals_assists | total_cards | |
|---|---|---|---|---|---|---|---|---|
| 4655 | Danilo | 106 | 70 | 201 | 4 | 80523 | 176 | 205 |
| 17554 | Paulinho | 166 | 115 | 180 | 4 | 71566 | 281 | 184 |
| 13851 | Marcelo | 52 | 72 | 172 | 8 | 76472 | 124 | 180 |
| 7058 | Fernando | 58 | 43 | 168 | 3 | 50717 | 101 | 171 |
| 19906 | Sergio Ramos | 70 | 19 | 140 | 6 | 36455 | 89 | 146 |
| 6989 | Felipe | 20 | 9 | 133 | 7 | 38048 | 29 | 140 |
| 19877 | Sergio Busquets | 11 | 33 | 136 | 1 | 41597 | 44 | 137 |
| 13957 | Marco Verratti | 10 | 51 | 128 | 1 | 26861 | 61 | 129 |
| 8299 | Granit Xhaka | 30 | 38 | 122 | 5 | 36790 | 68 | 127 |
| 4411 | Damián Suárez | 6 | 22 | 123 | 3 | 26048 | 28 | 126 |
| 23210 | Álvaro González | 10 | 14 | 121 | 3 | 31011 | 24 | 124 |
| 4505 | Daniel Carvajal | 11 | 66 | 121 | 1 | 34236 | 77 | 122 |
| 11560 | Kalidou Koulibaly | 19 | 13 | 116 | 4 | 36578 | 32 | 120 |
| 4464 | Dani Parejo | 74 | 89 | 118 | 2 | 42266 | 163 | 120 |
| 18261 | Raúl García | 112 | 48 | 118 | 1 | 28529 | 160 | 119 |
| 19034 | Rubén Pérez | 2 | 7 | 117 | 2 | 25450 | 9 | 119 |
| 14274 | Marten de Roon | 29 | 24 | 117 | 1 | 38109 | 53 | 118 |
| 22307 | Víctor Sánchez | 11 | 21 | 115 | 1 | 19852 | 32 | 116 |
| 11177 | João Pedro | 150 | 48 | 111 | 4 | 46671 | 198 | 115 |
| 13187 | Luis Suárez | 308 | 157 | 114 | 1 | 42668 | 465 | 115 |
# Plotting the bar chart
sorted_data_total_cards = total_20.sort_values(by='total_cards', ascending=True)
plt.barh(sorted_data_total_cards['player_name'], sorted_data_total_cards['total_cards'], color='green')
plt.xlabel('Number of Cards')
plt.ylabel('Players')
plt.title('Top 20 Players with most Cards')
plt.xticks(rotation=45)
plt.show()
#check which players are in the list of best goalscorers and assisters
merged_cards = pd.merge(amarelos_20, vermelhos_20, on='player_name', how='inner')
players_in_both1 = merged_cards['player_name'].tolist()
players_in_both1
['Marcelo', 'Sergio Ramos', 'Felipe']
After analysing the graphs we can see that:
Most of the players who appear in the graphs play in a more retreated part of the field (Defenders, Defensive Midfield)
Luis Suarez who is a forward, surprisingly figures on the list of players with more yellows
Only 3 players appear in the top 20 of players with more yellow and red cards given.
In this topic I will compare how was the contribution to goal over the years of Ronaldo, Messi and Lewandowski.
presencas.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1524125 entries, 0 to 1524125 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 appearance_id 1524125 non-null object 1 game_id 1524125 non-null int64 2 player_id 1524125 non-null int64 3 player_club_id 1524125 non-null int64 4 player_current_club_id 1524125 non-null int64 5 date 1524125 non-null datetime64[ns] 6 player_name 1524125 non-null category 7 competition_id 1524125 non-null category 8 yellow_cards 1524125 non-null int64 9 red_cards 1524125 non-null int64 10 goals 1524125 non-null int64 11 assists 1524125 non-null int64 12 minutes_played 1524125 non-null int64 dtypes: category(2), datetime64[ns](1), int64(9), object(1) memory usage: 144.6+ MB
presencas["year"] = presencas["date"].dt.year
presencas["month"] = presencas["date"].dt.month
presencas.head()
| appearance_id | game_id | player_id | player_club_id | player_current_club_id | date | player_name | competition_id | yellow_cards | red_cards | goals | assists | minutes_played | year | month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2231978_38004 | 2231978 | 38004 | 853 | 235 | 2012-07-03 | Aurélien Joachim | CLQ | 0 | 0 | 2 | 0 | 90 | 2012 | 7 |
| 1 | 2233748_79232 | 2233748 | 79232 | 8841 | 2698 | 2012-07-05 | Ruslan Abyshov | ELQ | 0 | 0 | 0 | 0 | 90 | 2012 | 7 |
| 2 | 2234413_42792 | 2234413 | 42792 | 6251 | 465 | 2012-07-05 | Sander Puri | ELQ | 0 | 0 | 0 | 0 | 45 | 2012 | 7 |
| 3 | 2234418_73333 | 2234418 | 73333 | 1274 | 6646 | 2012-07-05 | Vegar Hedenstad | ELQ | 0 | 0 | 0 | 0 | 90 | 2012 | 7 |
| 4 | 2234421_122011 | 2234421 | 122011 | 195 | 3008 | 2012-07-05 | Markus Henriksen | ELQ | 0 | 0 | 0 | 1 | 90 | 2012 | 7 |
presencas.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1524125 entries, 0 to 1524125 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 appearance_id 1524125 non-null object 1 game_id 1524125 non-null int64 2 player_id 1524125 non-null int64 3 player_club_id 1524125 non-null int64 4 player_current_club_id 1524125 non-null int64 5 date 1524125 non-null datetime64[ns] 6 player_name 1524125 non-null category 7 competition_id 1524125 non-null category 8 yellow_cards 1524125 non-null int64 9 red_cards 1524125 non-null int64 10 goals 1524125 non-null int64 11 assists 1524125 non-null int64 12 minutes_played 1524125 non-null int64 13 year 1524125 non-null int64 14 month 1524125 non-null int64 dtypes: category(2), datetime64[ns](1), int64(11), object(1) memory usage: 167.8+ MB
presencas.head()
| appearance_id | game_id | player_id | player_club_id | player_current_club_id | date | player_name | competition_id | yellow_cards | red_cards | goals | assists | minutes_played | year | month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2231978_38004 | 2231978 | 38004 | 853 | 235 | 2012-07-03 | Aurélien Joachim | CLQ | 0 | 0 | 2 | 0 | 90 | 2012 | 7 |
| 1 | 2233748_79232 | 2233748 | 79232 | 8841 | 2698 | 2012-07-05 | Ruslan Abyshov | ELQ | 0 | 0 | 0 | 0 | 90 | 2012 | 7 |
| 2 | 2234413_42792 | 2234413 | 42792 | 6251 | 465 | 2012-07-05 | Sander Puri | ELQ | 0 | 0 | 0 | 0 | 45 | 2012 | 7 |
| 3 | 2234418_73333 | 2234418 | 73333 | 1274 | 6646 | 2012-07-05 | Vegar Hedenstad | ELQ | 0 | 0 | 0 | 0 | 90 | 2012 | 7 |
| 4 | 2234421_122011 | 2234421 | 122011 | 195 | 3008 | 2012-07-05 | Markus Henriksen | ELQ | 0 | 0 | 0 | 1 | 90 | 2012 | 7 |
lewa_present = "Robert Lewandowski" in presencas["player_name"].unique()
lewa_present
True
messi_present = "Lionel Messi" in presencas["player_name"].unique()
messi_present
True
ronaldo_present = "Cristiano Ronaldo" in presencas["player_name"].unique()
ronaldo_present
True
result = presencas.groupby(["player_name", "year"])[["goals", "assists"]].agg("sum").reset_index()
# Add a column for the total contribution (goals + assists)
result["contribution"] = result["goals"] + result["assists"]
result
| player_name | year | goals | assists | contribution | |
|---|---|---|---|---|---|
| 0 | A.J. Soares | 2012 | 0 | 0 | 0 |
| 1 | A.J. Soares | 2013 | 0 | 0 | 0 |
| 2 | A.J. Soares | 2014 | 0 | 0 | 0 |
| 3 | A.J. Soares | 2015 | 0 | 0 | 0 |
| 4 | A.J. Soares | 2016 | 1 | 0 | 1 |
| ... | ... | ... | ... | ... | ... |
| 303714 | Αbdul Razak | 2020 | 0 | 0 | 0 |
| 303715 | Αbdul Razak | 2021 | 0 | 0 | 0 |
| 303716 | Αbdul Razak | 2022 | 0 | 0 | 0 |
| 303717 | Αbdul Razak | 2023 | 0 | 0 | 0 |
| 303718 | Αbdul Razak | 2024 | 0 | 0 | 0 |
303719 rows × 5 columns
nomes_desejados = ["Lionel Messi", "Robert Lewandowski", "Cristiano Ronaldo"]
# Filtering the result only for the 3 desired names
result_filtered = result[result["player_name"].isin(nomes_desejados)]
result_filtered
| player_name | year | goals | assists | contribution | |
|---|---|---|---|---|---|
| 55432 | Cristiano Ronaldo | 2012 | 23 | 4 | 27 |
| 55433 | Cristiano Ronaldo | 2013 | 59 | 17 | 76 |
| 55434 | Cristiano Ronaldo | 2014 | 56 | 21 | 77 |
| 55435 | Cristiano Ronaldo | 2015 | 54 | 18 | 72 |
| 55436 | Cristiano Ronaldo | 2016 | 42 | 14 | 56 |
| 55437 | Cristiano Ronaldo | 2017 | 42 | 9 | 51 |
| 55438 | Cristiano Ronaldo | 2018 | 43 | 12 | 55 |
| 55439 | Cristiano Ronaldo | 2019 | 25 | 5 | 30 |
| 55440 | Cristiano Ronaldo | 2020 | 41 | 6 | 47 |
| 55441 | Cristiano Ronaldo | 2021 | 34 | 6 | 40 |
| 55442 | Cristiano Ronaldo | 2022 | 13 | 2 | 15 |
| 55443 | Cristiano Ronaldo | 2023 | 0 | 0 | 0 |
| 55444 | Cristiano Ronaldo | 2024 | 0 | 0 | 0 |
| 166452 | Lionel Messi | 2012 | 35 | 9 | 44 |
| 166453 | Lionel Messi | 2013 | 39 | 12 | 51 |
| 166454 | Lionel Messi | 2014 | 50 | 20 | 70 |
| 166455 | Lionel Messi | 2015 | 48 | 27 | 75 |
| 166456 | Lionel Messi | 2016 | 51 | 27 | 78 |
| 166457 | Lionel Messi | 2017 | 50 | 19 | 69 |
| 166458 | Lionel Messi | 2018 | 42 | 24 | 66 |
| 166459 | Lionel Messi | 2019 | 42 | 14 | 56 |
| 166460 | Lionel Messi | 2020 | 26 | 23 | 49 |
| 166461 | Lionel Messi | 2021 | 34 | 14 | 48 |
| 166462 | Lionel Messi | 2022 | 17 | 24 | 41 |
| 166463 | Lionel Messi | 2023 | 9 | 6 | 15 |
| 166464 | Lionel Messi | 2024 | 0 | 0 | 0 |
| 241787 | Robert Lewandowski | 2012 | 16 | 9 | 25 |
| 241788 | Robert Lewandowski | 2013 | 36 | 13 | 49 |
| 241789 | Robert Lewandowski | 2014 | 22 | 11 | 33 |
| 241790 | Robert Lewandowski | 2015 | 38 | 10 | 48 |
| 241791 | Robert Lewandowski | 2016 | 39 | 6 | 45 |
| 241792 | Robert Lewandowski | 2017 | 44 | 8 | 52 |
| 241793 | Robert Lewandowski | 2018 | 42 | 8 | 50 |
| 241794 | Robert Lewandowski | 2019 | 48 | 9 | 57 |
| 241795 | Robert Lewandowski | 2020 | 45 | 13 | 58 |
| 241796 | Robert Lewandowski | 2021 | 58 | 7 | 65 |
| 241797 | Robert Lewandowski | 2022 | 38 | 8 | 46 |
| 241798 | Robert Lewandowski | 2023 | 24 | 9 | 33 |
| 241799 | Robert Lewandowski | 2024 | 3 | 0 | 3 |
data = {
"Cristiano Ronaldo": {
"years": [2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024],
"contributions": [27, 76, 77, 72, 56, 51, 55, 30, 47, 40, 15, 0, 0],
},
"Lionel Messi": {
"years": [2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024],
"contributions": [44, 51, 70, 75, 78, 69, 66, 56, 49, 48, 41, 15, 0],
},
"Robert Lewandowski": {
"years": [2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024],
"contributions": [25, 49, 33, 48, 45, 52, 50, 57, 58, 65, 46, 33, 3],
}
}
# Plotting the data
bar_width = 0.25
index = np.arange(len(data["Cristiano Ronaldo"]["years"]))
fig, ax = plt.subplots(figsize=(12, 7))
# Bar for Cristiano Ronaldo
bar1 = ax.bar(index - bar_width, data["Cristiano Ronaldo"]["contributions"], bar_width, label='Cristiano Ronaldo')
# Bar for Lionel Messi
bar2 = ax.bar(index, data["Lionel Messi"]["contributions"], bar_width, label='Lionel Messi')
# Bar for Robert Lewandowski
bar3 = ax.bar(index + bar_width, data["Robert Lewandowski"]["contributions"], bar_width, label='Robert Lewandowski')
# Adding labels and title
ax.set_xlabel('Year')
ax.set_ylabel('Contributions (Goals + Assists)')
ax.set_title('Player Contributions Over the Years')
ax.set_xticks(index)
ax.set_xticklabels(data["Cristiano Ronaldo"]["years"])
ax.legend()
# Display the plot
plt.show()
Starting with the year 2023 and 2024, I believe that Ronaldo doesn't have statistics because we don't have Saudi Arabian's statistics, same happens to Messi and MLS.
This graph shows the goal contribution these 3 players had over the years. Note that Lewandowski had more contributions for goal when Ronaldo and Messi started to slow down. If it was for total goals I believe Cristiano would be top more times than just 2. Messi tops with 78 contributions in the year 2016 followed by Ronaldo in 2014 with 77 and 76 in 2013.
I'm going to talk a little about Sporting Clube de Portugal. In this project I'm not going to stretch much but in a next project I want to talk more about Sporting and the Portuguese League. In this analysis I will only talk about the market value of Sporting players, and what's the assistance of the team's games in recent years.
jogos.head()
| game_id | competition_id | season | round | date | home_club_id | away_club_id | home_club_goals | away_club_goals | home_club_position | ... | stadium | attendance | referee | url | home_club_formation | away_club_formation | home_club_name | away_club_name | aggregate | competition_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2321044 | L1 | 2013 | 2. Matchday | 2013-08-18 | 16 | 23 | 2 | 1 | 1.0 | ... | SIGNAL IDUNA PARK | 80200.0 | Peter Sippel | https://www.transfermarkt.co.uk/borussia-dortm... | 4-2-3-1 | 4-3-2-1 | Borussia Dortmund | Eintracht Braunschweig | 2:1 | domestic_league |
| 1 | 2321060 | L1 | 2013 | 3. Matchday | 2013-08-25 | 23 | 24 | 0 | 2 | 18.0 | ... | EINTRACHT-Stadion | 23325.0 | Wolfgang Stark | https://www.transfermarkt.co.uk/eintracht-brau... | 4-3-2-1 | 4-2-3-1 | Eintracht Braunschweig | Eintracht Frankfurt Fußball AG | 0:2 | domestic_league |
| 2 | 2321086 | L1 | 2013 | 6. Matchday | 2013-09-21 | 4 | 16 | 1 | 1 | 15.0 | ... | Max-Morlock-Stadion | 50000.0 | Knut Kircher | https://www.transfermarkt.co.uk/1-fc-nuremberg... | 4-2-3-1 | 4-2-3-1 | 1.FC Nuremberg | Borussia Dortmund | 1:1 | domestic_league |
| 3 | 2321152 | L1 | 2013 | 11. Matchday | 2013-11-02 | 44 | 33 | 0 | 2 | 7.0 | ... | Olympiastadion Berlin | 69277.0 | Günter Perl | https://www.transfermarkt.co.uk/hertha-bsc_fc-... | 4-2-3-1 | 4-2-3-1 | Hertha BSC | FC Schalke 04 | 0:2 | domestic_league |
| 4 | 2321205 | L1 | 2013 | 17. Matchday | 2013-12-21 | 41 | 39 | 2 | 3 | 14.0 | ... | Volksparkstadion | 50000.0 | Bastian Dankert | https://www.transfermarkt.co.uk/hamburger-sv_1... | 4-2-3-1 | 4-4-2 Diamond | Hamburger SV | 1. Fußball- und Sportverein Mainz 05 | 2:3 | domestic_league |
5 rows × 23 columns
sporting_home = jogos[jogos['home_club_name'] == 'Sporting Clube de Portugal']
sporting_home.head()
| game_id | competition_id | season | round | date | home_club_id | away_club_id | home_club_goals | away_club_goals | home_club_position | ... | stadium | attendance | referee | url | home_club_formation | away_club_formation | home_club_name | away_club_name | aggregate | competition_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1398 | 2495298 | CL | 2014 | Group G | 2014-09-30 | 336 | 631 | 0 | 1 | NaN | ... | Estádio José Alvalade XXI | 40734.0 | Mateu Lahoz | https://www.transfermarkt.co.uk/spielbericht/i... | 4-3-3 Attacking | 4-2-3-1 | Sporting Clube de Portugal | Chelsea Football Club | 0:1 | international_cup |
| 1405 | 2526197 | PO1 | 2014 | 26. Matchday | 2015-03-22 | 336 | 2420 | 4 | 1 | 3.0 | ... | Estádio José Alvalade XXI | 35979.0 | Jorge Sousa | https://www.transfermarkt.co.uk/sporting-cp_vi... | 4-3-3 Attacking | 4-3-3 Defending | Sporting Clube de Portugal | Vitória Sport Clube | 4:1 | domestic_league |
| 1419 | 2596940 | PO1 | 2015 | 2. Matchday | 2015-08-22 | 336 | 2995 | 1 | 1 | 5.0 | ... | Estádio José Alvalade XXI | 40639.0 | Manuel Oliveira | https://www.transfermarkt.co.uk/sporting-cp_fc... | 4-4-2 double 6 | 4-2-3-1 | Sporting Clube de Portugal | FC Paços de Ferreira | 1:1 | domestic_league |
| 1442 | 2720784 | PO1 | 2016 | 4. Matchday | 2016-09-10 | 336 | 979 | 3 | 0 | 1.0 | ... | Estádio José Alvalade XXI | 44393.0 | Nuno Almeida | https://www.transfermarkt.co.uk/sporting-cp_mo... | 4-4-2 double 6 | 4-3-3 Attacking | Sporting Clube de Portugal | Moreirense Futebol Clube | 3:0 | domestic_league |
| 1456 | 2883759 | PO1 | 2017 | 13. Matchday | 2017-12-01 | 336 | 68608 | 1 | 0 | 2.0 | ... | Estádio José Alvalade XXI | 46881.0 | Nuno Almeida | https://www.transfermarkt.co.uk/sporting-cp_cf... | 4-4-2 Diamond | 4-4-2 double 6 | Sporting Clube de Portugal | CF Os Belenenses | 1:0 | domestic_league |
5 rows × 23 columns
plt.figure(figsize=(10, 6))
plt.plot(average_attendance_per_year, marker='o', linestyle='-', color='b')
plt.title('Average Stadium Attendance Per Year')
plt.xlabel('Year')
plt.ylabel('Average Attendance')
plt.grid(True)
plt.show()
Much can be said about this graph and why so much fluctuation in the average of assistance over the years.
Starting with the year 2012 and 2013, the average of assistance is relatively low because it coincides with Sporting's worst ranking ever in the Portuguese championship, they were seventh that year. The following year there is an increase because with the entry of a new president and some investment in the team allied with good results caused the average of assistance to increase, the same happened in 2015, although there was less assistance compared to the last year.
The 2016 boom is clearly influenced by the signing of Jorge Jesus the main rival's coach, good results, broke the record of points in the championship but still were not champions, finishing in second. The following year, champions league and good performance in the league kept the average of assists very high.
The fall in the following years is due to an attack that took place at Sporting's training centre, which caused many of the players to terminate their contract so that Sporting did not receive any value for many of the club's great assets. The investment was not the best, Sporting was in crisis, people didn't believe that Sporting wouln'd be good that year and it was reflected on assistance.
The year 2021 was the year in which Sporting won the championship again, but the games were played behind closed doors because of the pandemic.
jogadores.head()
| player_id | first_name | last_name | name | last_season | current_club_id | player_code | country_of_birth | city_of_birth | country_of_citizenship | ... | contract_expiration_date | agent_name | image_url | url | current_club_domestic_competition_id | current_club_name | market_value_in_eur | highest_market_value_in_eur | date | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10 | Miroslav | Klose | Miroslav Klose | 2015 | 398 | miroslav-klose | Poland | Opole | Germany | ... | NaN | ASBW Sport Marketing | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/miroslav-klose... | IT1 | Società Sportiva Lazio S.p.A. | 1000000.0 | 30000000.0 | 2012-08-05 | 2012 |
| 1 | 26 | Roman | Weidenfeller | Roman Weidenfeller | 2017 | 16 | roman-weidenfeller | Germany | Diez | Germany | ... | NaN | Neubauer 13 GmbH | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/roman-weidenfe... | L1 | Borussia Dortmund | 750000.0 | 8000000.0 | 2012-08-05 | 2012 |
| 2 | 65 | Dimitar | Berbatov | Dimitar Berbatov | 2015 | 1091 | dimitar-berbatov | Bulgaria | Blagoevgrad | Bulgaria | ... | NaN | CSKA-AS-23 Ltd. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/dimitar-berbat... | GR1 | Panthessalonikios Athlitikos Omilos Konstantin... | 1000000.0 | 34500000.0 | 2012-08-05 | 2012 |
| 3 | 77 | NaN | Lúcio | Lúcio | 2012 | 506 | lucio | Brazil | Brasília | Brazil | ... | NaN | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/lucio/profil/s... | IT1 | Juventus Football Club | 200000.0 | 24500000.0 | 2012-08-05 | 2012 |
| 4 | 80 | Tom | Starke | Tom Starke | 2017 | 27 | tom-starke | East Germany (GDR) | Freital | Germany | ... | NaN | IFM | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/tom-starke/pro... | L1 | FC Bayern München | 100000.0 | 3000000.0 | 2012-08-05 | 2012 |
5 rows × 25 columns
jogadores['year'] = jogadores['date'].dt.year
sporting_players = jogadores[jogadores['current_club_name'] == 'Sporting Clube de Portugal']
sporting_players.head()
| player_id | first_name | last_name | name | last_season | current_club_id | player_code | country_of_birth | city_of_birth | country_of_citizenship | ... | contract_expiration_date | agent_name | image_url | url | current_club_domestic_competition_id | current_club_name | market_value_in_eur | highest_market_value_in_eur | date | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 427 | 5404 | Jérémy | Mathieu | Jérémy Mathieu | 2019 | 336 | jeremy-mathieu | France | Luxeuil-les-Bains | France | ... | NaN | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/jeremy-mathieu... | PO1 | Sporting Clube de Portugal | 1200000.0 | 15000000.0 | 2012-08-19 | 2012 |
| 2284 | 27303 | NaN | Gerson Magrão | Gerson Magrão | 2013 | 336 | gerson-magrao | Brazil | Diadema | Brazil | ... | NaN | GP Sports | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/gerson-magrao/... | PO1 | Sporting Clube de Portugal | 200000.0 | 4000000.0 | 2012-11-25 | 2012 |
| 2289 | 27346 | NaN | Marcelo Boeck | Marcelo Boeck | 2015 | 336 | marcelo-boeck | Brazil | Santa Cruz | Brazil | ... | NaN | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/marcelo-boeck/... | PO1 | Sporting Clube de Portugal | 200000.0 | 1800000.0 | 2012-11-25 | 2012 |
| 3375 | 39642 | Bryan | Ruiz | Bryan Ruiz | 2017 | 336 | bryan-ruiz | Costa Rica | San José | Costa Rica | ... | NaN | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/bryan-ruiz/pro... | PO1 | Sporting Clube de Portugal | 150000.0 | 13000000.0 | 2013-04-21 | 2013 |
| 3743 | 44062 | Antonio | Adán | Antonio Adán | 2023 | 336 | antonio-adan | Spain | Madrid | Spain | ... | 2024-06-30 00:00:00 | Gesport Espizua SL | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/antonio-adan/p... | PO1 | Sporting Clube de Portugal | 800000.0 | 8000000.0 | 2012-07-22 | 2012 |
5 rows × 25 columns
jogadores['last_season'] = pd.to_datetime(jogadores['last_season'], format='%Y')
jogadores['last_season'] = jogadores['last_season'].dt.year
sporting_players_2023 = sporting_players[sporting_players['last_season'] == 2023]
sporting_players_2023
| player_id | first_name | last_name | name | last_season | current_club_id | player_code | country_of_birth | city_of_birth | country_of_citizenship | ... | contract_expiration_date | agent_name | image_url | url | current_club_domestic_competition_id | current_club_name | market_value_in_eur | highest_market_value_in_eur | date | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3743 | 44062 | Antonio | Adán | Antonio Adán | 2023 | 336 | antonio-adan | Spain | Madrid | Spain | ... | 2024-06-30 00:00:00 | Gesport Espizua SL | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/antonio-adan/p... | PO1 | Sporting Clube de Portugal | 800000.0 | 8000000.0 | 2012-07-22 | 2012 |
| 4819 | 56036 | Luís | Neto | Luís Neto | 2023 | 336 | luis-neto | Portugal | Póvoa do Varzim | Portugal | ... | 2024-06-30 00:00:00 | Team of Future | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/luis-neto/prof... | PO1 | Sporting Clube de Portugal | 300000.0 | 6000000.0 | 2012-10-20 | 2012 |
| 7900 | 102427 | Sebastián | Coates | Sebastián Coates | 2023 | 336 | sebastian-coates | Uruguay | Montevideo | Uruguay | ... | 2024-06-30 00:00:00 | InterLex Sport | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/sebastian-coat... | PO1 | Sporting Clube de Portugal | 4000000.0 | 15000000.0 | 2012-11-11 | 2012 |
| 9702 | 139395 | Ricardo | Esgaio | Ricardo Esgaio | 2023 | 336 | ricardo-esgaio | Portugal | Nazaré | Portugal | ... | 2026-06-30 00:00:00 | Proeleven S.A. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/ricardo-esgaio... | PO1 | Sporting Clube de Portugal | 3500000.0 | 6000000.0 | 2013-03-09 | 2013 |
| 13063 | 211072 | NaN | Paulinho | Paulinho | 2023 | 336 | paulinho | Portugal | Barcelos | Portugal | ... | 2026-06-30 00:00:00 | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/paulinho/profi... | PO1 | Sporting Clube de Portugal | 6500000.0 | 15000000.0 | 2013-05-16 | 2013 |
| 16488 | 288253 | Jerry | St. Juste | Jerry St. Juste | 2023 | 336 | jerry-st-juste | Netherlands | Groningen | Netherlands | ... | 2026-06-30 00:00:00 | St. Juste Management | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/jerry-st-juste... | PO1 | Sporting Clube de Portugal | 8000000.0 | 16000000.0 | 2012-09-22 | 2012 |
| 16624 | 290391 | Nuno | Santos | Nuno Santos | 2023 | 336 | nuno-santos | Portugal | Trofa | Portugal | ... | 2027-06-30 00:00:00 | Jorge Pires | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/nuno-santos/pr... | PO1 | Sporting Clube de Portugal | 12000000.0 | 14000000.0 | 2012-10-06 | 2012 |
| 17431 | 314367 | Marcus | Edwards | Marcus Edwards | 2023 | 336 | marcus-edwards | England | London | England | ... | 2026-06-30 00:00:00 | ARETÉ | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/marcus-edwards... | PO1 | Sporting Clube de Portugal | 28000000.0 | 30000000.0 | 2012-11-24 | 2012 |
| 17888 | 325443 | Viktor | Gyökeres | Viktor Gyökeres | 2023 | 336 | viktor-gyokeres | Sweden | Stockholm | Sweden | ... | 2028-06-30 00:00:00 | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/viktor-gyokere... | PO1 | Sporting Clube de Portugal | 45000000.0 | 45000000.0 | 2013-03-02 | 2013 |
| 20076 | 379249 | NaN | Matheus Reis | Matheus Reis | 2023 | 336 | matheus-reis | Brazil | São João da Boa Vista | Brazil | ... | 2025-06-30 00:00:00 | ADM Esporte | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/matheus-reis/p... | PO1 | Sporting Clube de Portugal | 8000000.0 | 8000000.0 | 2013-04-03 | 2013 |
| 20969 | 400831 | Rafael | Camacho | Rafael Camacho | 2023 | 336 | rafael-camacho | Portugal | Lisboa | Portugal | ... | 2024-06-30 00:00:00 | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/rafael-camacho... | PO1 | Sporting Clube de Portugal | 1500000.0 | 4000000.0 | 2012-08-26 | 2012 |
| 21317 | 412669 | NaN | Trincão | Trincão | 2023 | 336 | trincao | Portugal | Viana do Castelo | Portugal | ... | 2026-06-30 00:00:00 | Gestifute | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/trincao/profil... | PO1 | Sporting Clube de Portugal | 10000000.0 | 30000000.0 | 2012-08-04 | 2012 |
| 21718 | 426620 | Pedro | Gonçalves | Pedro Gonçalves | 2023 | 336 | pedro-goncalves | Portugal | Chaves | Portugal | ... | 2027-06-30 00:00:00 | Jorge Pires | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/pedro-goncalve... | PO1 | Sporting Clube de Portugal | 28000000.0 | 38000000.0 | 2012-09-23 | 2012 |
| 22571 | 461496 | Morten | Hjulmand | Morten Hjulmand | 2023 | 336 | morten-hjulmand | Denmark | Kastrup | Denmark | ... | 2028-06-30 00:00:00 | Boutique Transfers | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/morten-hjulman... | PO1 | Sporting Clube de Portugal | 26000000.0 | 26000000.0 | 2012-11-17 | 2012 |
| 22614 | 461943 | Daniel | Bragança | Daniel Bragança | 2023 | 336 | daniel-braganca | Portugal | Almeirim | Portugal | ... | 2027-06-30 00:00:00 | Gestifute | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/daniel-braganc... | PO1 | Sporting Clube de Portugal | 5000000.0 | 8000000.0 | 2012-11-11 | 2012 |
| 23352 | 492319 | Franco | Israel | Franco Israel | 2023 | 336 | franco-israel | Uruguay | Nueva Helvecia | Uruguay | ... | 2027-06-30 00:00:00 | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/franco-israel/... | PO1 | Sporting Clube de Portugal | 2500000.0 | 2500000.0 | 2013-02-01 | 2013 |
| 23600 | 504073 | Eduardo | Quaresma | Eduardo Quaresma | 2023 | 336 | eduardo-quaresma | Portugal | Barreiro | Portugal | ... | 2025-06-30 00:00:00 | Proeleven S.A. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/eduardo-quares... | PO1 | Sporting Clube de Portugal | 2500000.0 | 7000000.0 | 2013-01-19 | 2013 |
| 24035 | 524315 | Hidemasa | Morita | Hidemasa Morita | 2023 | 336 | hidemasa-morita | Japan | Takatsuki, Osaka | Japan | ... | 2026-06-30 00:00:00 | HEROE | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/hidemasa-morit... | PO1 | Sporting Clube de Portugal | 12000000.0 | 12000000.0 | 2013-04-12 | 2013 |
| 24692 | 549006 | Gonçalo | Inácio | Gonçalo Inácio | 2023 | 336 | goncalo-inacio | Portugal | Almada | Portugal | ... | 2027-06-30 00:00:00 | MRP.POSITIONUMBER | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/goncalo-inacio... | PO1 | Sporting Clube de Portugal | 40000000.0 | 40000000.0 | 2013-03-30 | 2013 |
| 27229 | 670717 | Dário | Essugo | Dário Essugo | 2023 | 336 | dario-essugo | Portugal | Lisboa | Portugal | ... | 2025-06-30 00:00:00 | Promoesport | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/dario-essugo/p... | PO1 | Sporting Clube de Portugal | 2000000.0 | 2000000.0 | 2012-11-10 | 2012 |
| 27708 | 701979 | Geny | Catamo | Geny Catamo | 2023 | 336 | geny-catamo | Mozambique | Maputo | Mozambique | ... | 2028-06-30 00:00:00 | Promoesport | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/geny-catamo/pr... | PO1 | Sporting Clube de Portugal | 4000000.0 | 4000000.0 | 2013-01-23 | 2013 |
| 28125 | 726701 | Rodrigo | Ribeiro | Rodrigo Ribeiro | 2023 | 336 | rodrigo-ribeiro | Portugal | Viana do Castelo | Portugal | ... | 2025-06-30 00:00:00 | PCR Sports | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/rodrigo-ribeir... | PO1 | Sporting Clube de Portugal | 3000000.0 | 3000000.0 | 2013-01-19 | 2013 |
| 29872 | 946883 | Iván | Fresneda | Iván Fresneda | 2023 | 336 | ivan-fresneda | Spain | Madrid | Spain | ... | 2028-06-30 00:00:00 | LIAN Sports Group | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/ivan-fresneda/... | PO1 | Sporting Clube de Portugal | 13000000.0 | 15000000.0 | 2013-05-11 | 2013 |
| 29987 | 974506 | NaN | Rafael | Rafael | 2023 | 336 | rafael | Brazil | Sete Lagoas | Brazil | ... | 2028-06-30 00:00:00 | ADRIANO SPORTS | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/rafael/profil/... | PO1 | Sporting Clube de Portugal | 375000.0 | 375000.0 | 2012-08-22 | 2012 |
| 29988 | 974982 | Ousmane | Diomande | Ousmane Diomande | 2023 | 336 | ousmane-diomande | Cote d'Ivoire | Abidjan | Cote d'Ivoire | ... | 2027-06-30 00:00:00 | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/ousmane-dioman... | PO1 | Sporting Clube de Portugal | 40000000.0 | 40000000.0 | 2012-08-22 | 2012 |
| 30021 | 984439 | João | Muniz | João Muniz | 2023 | 336 | joao-muniz | Portugal | Ribeira Brava | Portugal | ... | 2028-06-30 00:00:00 | Proeleven S.A. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/joao-muniz/pro... | PO1 | Sporting Clube de Portugal | 500000.0 | 500000.0 | 2012-09-26 | 2012 |
26 rows × 25 columns
sporting_players_2023['market_value_in_million'] = sporting_players_2023['market_value_in_eur'] / 1e6
sporting_players_2023
/var/folders/d2/pwql6b_j2sjbv968h9tr70_h0000gn/T/ipykernel_60628/329111145.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
| player_id | first_name | last_name | name | last_season | current_club_id | player_code | country_of_birth | city_of_birth | country_of_citizenship | ... | agent_name | image_url | url | current_club_domestic_competition_id | current_club_name | market_value_in_eur | highest_market_value_in_eur | date | year | market_value_in_million | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3743 | 44062 | Antonio | Adán | Antonio Adán | 2023 | 336 | antonio-adan | Spain | Madrid | Spain | ... | Gesport Espizua SL | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/antonio-adan/p... | PO1 | Sporting Clube de Portugal | 800000.0 | 8000000.0 | 2012-07-22 | 2012 | 0.800 |
| 4819 | 56036 | Luís | Neto | Luís Neto | 2023 | 336 | luis-neto | Portugal | Póvoa do Varzim | Portugal | ... | Team of Future | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/luis-neto/prof... | PO1 | Sporting Clube de Portugal | 300000.0 | 6000000.0 | 2012-10-20 | 2012 | 0.300 |
| 7900 | 102427 | Sebastián | Coates | Sebastián Coates | 2023 | 336 | sebastian-coates | Uruguay | Montevideo | Uruguay | ... | InterLex Sport | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/sebastian-coat... | PO1 | Sporting Clube de Portugal | 4000000.0 | 15000000.0 | 2012-11-11 | 2012 | 4.000 |
| 9702 | 139395 | Ricardo | Esgaio | Ricardo Esgaio | 2023 | 336 | ricardo-esgaio | Portugal | Nazaré | Portugal | ... | Proeleven S.A. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/ricardo-esgaio... | PO1 | Sporting Clube de Portugal | 3500000.0 | 6000000.0 | 2013-03-09 | 2013 | 3.500 |
| 13063 | 211072 | NaN | Paulinho | Paulinho | 2023 | 336 | paulinho | Portugal | Barcelos | Portugal | ... | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/paulinho/profi... | PO1 | Sporting Clube de Portugal | 6500000.0 | 15000000.0 | 2013-05-16 | 2013 | 6.500 |
| 16488 | 288253 | Jerry | St. Juste | Jerry St. Juste | 2023 | 336 | jerry-st-juste | Netherlands | Groningen | Netherlands | ... | St. Juste Management | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/jerry-st-juste... | PO1 | Sporting Clube de Portugal | 8000000.0 | 16000000.0 | 2012-09-22 | 2012 | 8.000 |
| 16624 | 290391 | Nuno | Santos | Nuno Santos | 2023 | 336 | nuno-santos | Portugal | Trofa | Portugal | ... | Jorge Pires | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/nuno-santos/pr... | PO1 | Sporting Clube de Portugal | 12000000.0 | 14000000.0 | 2012-10-06 | 2012 | 12.000 |
| 17431 | 314367 | Marcus | Edwards | Marcus Edwards | 2023 | 336 | marcus-edwards | England | London | England | ... | ARETÉ | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/marcus-edwards... | PO1 | Sporting Clube de Portugal | 28000000.0 | 30000000.0 | 2012-11-24 | 2012 | 28.000 |
| 17888 | 325443 | Viktor | Gyökeres | Viktor Gyökeres | 2023 | 336 | viktor-gyokeres | Sweden | Stockholm | Sweden | ... | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/viktor-gyokere... | PO1 | Sporting Clube de Portugal | 45000000.0 | 45000000.0 | 2013-03-02 | 2013 | 45.000 |
| 20076 | 379249 | NaN | Matheus Reis | Matheus Reis | 2023 | 336 | matheus-reis | Brazil | São João da Boa Vista | Brazil | ... | ADM Esporte | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/matheus-reis/p... | PO1 | Sporting Clube de Portugal | 8000000.0 | 8000000.0 | 2013-04-03 | 2013 | 8.000 |
| 20969 | 400831 | Rafael | Camacho | Rafael Camacho | 2023 | 336 | rafael-camacho | Portugal | Lisboa | Portugal | ... | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/rafael-camacho... | PO1 | Sporting Clube de Portugal | 1500000.0 | 4000000.0 | 2012-08-26 | 2012 | 1.500 |
| 21317 | 412669 | NaN | Trincão | Trincão | 2023 | 336 | trincao | Portugal | Viana do Castelo | Portugal | ... | Gestifute | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/trincao/profil... | PO1 | Sporting Clube de Portugal | 10000000.0 | 30000000.0 | 2012-08-04 | 2012 | 10.000 |
| 21718 | 426620 | Pedro | Gonçalves | Pedro Gonçalves | 2023 | 336 | pedro-goncalves | Portugal | Chaves | Portugal | ... | Jorge Pires | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/pedro-goncalve... | PO1 | Sporting Clube de Portugal | 28000000.0 | 38000000.0 | 2012-09-23 | 2012 | 28.000 |
| 22571 | 461496 | Morten | Hjulmand | Morten Hjulmand | 2023 | 336 | morten-hjulmand | Denmark | Kastrup | Denmark | ... | Boutique Transfers | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/morten-hjulman... | PO1 | Sporting Clube de Portugal | 26000000.0 | 26000000.0 | 2012-11-17 | 2012 | 26.000 |
| 22614 | 461943 | Daniel | Bragança | Daniel Bragança | 2023 | 336 | daniel-braganca | Portugal | Almeirim | Portugal | ... | Gestifute | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/daniel-braganc... | PO1 | Sporting Clube de Portugal | 5000000.0 | 8000000.0 | 2012-11-11 | 2012 | 5.000 |
| 23352 | 492319 | Franco | Israel | Franco Israel | 2023 | 336 | franco-israel | Uruguay | Nueva Helvecia | Uruguay | ... | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/franco-israel/... | PO1 | Sporting Clube de Portugal | 2500000.0 | 2500000.0 | 2013-02-01 | 2013 | 2.500 |
| 23600 | 504073 | Eduardo | Quaresma | Eduardo Quaresma | 2023 | 336 | eduardo-quaresma | Portugal | Barreiro | Portugal | ... | Proeleven S.A. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/eduardo-quares... | PO1 | Sporting Clube de Portugal | 2500000.0 | 7000000.0 | 2013-01-19 | 2013 | 2.500 |
| 24035 | 524315 | Hidemasa | Morita | Hidemasa Morita | 2023 | 336 | hidemasa-morita | Japan | Takatsuki, Osaka | Japan | ... | HEROE | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/hidemasa-morit... | PO1 | Sporting Clube de Portugal | 12000000.0 | 12000000.0 | 2013-04-12 | 2013 | 12.000 |
| 24692 | 549006 | Gonçalo | Inácio | Gonçalo Inácio | 2023 | 336 | goncalo-inacio | Portugal | Almada | Portugal | ... | MRP.POSITIONUMBER | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/goncalo-inacio... | PO1 | Sporting Clube de Portugal | 40000000.0 | 40000000.0 | 2013-03-30 | 2013 | 40.000 |
| 27229 | 670717 | Dário | Essugo | Dário Essugo | 2023 | 336 | dario-essugo | Portugal | Lisboa | Portugal | ... | Promoesport | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/dario-essugo/p... | PO1 | Sporting Clube de Portugal | 2000000.0 | 2000000.0 | 2012-11-10 | 2012 | 2.000 |
| 27708 | 701979 | Geny | Catamo | Geny Catamo | 2023 | 336 | geny-catamo | Mozambique | Maputo | Mozambique | ... | Promoesport | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/geny-catamo/pr... | PO1 | Sporting Clube de Portugal | 4000000.0 | 4000000.0 | 2013-01-23 | 2013 | 4.000 |
| 28125 | 726701 | Rodrigo | Ribeiro | Rodrigo Ribeiro | 2023 | 336 | rodrigo-ribeiro | Portugal | Viana do Castelo | Portugal | ... | PCR Sports | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/rodrigo-ribeir... | PO1 | Sporting Clube de Portugal | 3000000.0 | 3000000.0 | 2013-01-19 | 2013 | 3.000 |
| 29872 | 946883 | Iván | Fresneda | Iván Fresneda | 2023 | 336 | ivan-fresneda | Spain | Madrid | Spain | ... | LIAN Sports Group | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/ivan-fresneda/... | PO1 | Sporting Clube de Portugal | 13000000.0 | 15000000.0 | 2013-05-11 | 2013 | 13.000 |
| 29987 | 974506 | NaN | Rafael | Rafael | 2023 | 336 | rafael | Brazil | Sete Lagoas | Brazil | ... | ADRIANO SPORTS | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/rafael/profil/... | PO1 | Sporting Clube de Portugal | 375000.0 | 375000.0 | 2012-08-22 | 2012 | 0.375 |
| 29988 | 974982 | Ousmane | Diomande | Ousmane Diomande | 2023 | 336 | ousmane-diomande | Cote d'Ivoire | Abidjan | Cote d'Ivoire | ... | NaN | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/ousmane-dioman... | PO1 | Sporting Clube de Portugal | 40000000.0 | 40000000.0 | 2012-08-22 | 2012 | 40.000 |
| 30021 | 984439 | João | Muniz | João Muniz | 2023 | 336 | joao-muniz | Portugal | Ribeira Brava | Portugal | ... | Proeleven S.A. | https://img.a.transfermarkt.technology/portrai... | https://www.transfermarkt.co.uk/joao-muniz/pro... | PO1 | Sporting Clube de Portugal | 500000.0 | 500000.0 | 2012-09-26 | 2012 | 0.500 |
26 rows × 26 columns
jogadores_sorted = sporting_players_2023.sort_values(by='market_value_in_million', ascending=False).head(10)
# Creating graph
plt.figure(figsize=(10, 6))
plt.bar(jogadores_sorted['name'], jogadores_sorted['market_value_in_million'], color='green')
plt.xlabel('Player')
plt.ylabel('Market Value (EUR)')
plt.title('Top 10 Market Value Sporting Clube de Portugal Players (23/24)')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In the graph we can see that Viktor Gyokeres is the most 'valuable' player of Sporting followed by Goncalo Igacio and Diomande. Gyokeres is the highest purchase of Sporting of all time and his performance during this season makes him the most valuable player of Sporting at the moment.
Of this top 3 only Gyokeres plays in the attack, Inacio and Diomande play as defenders.
The purpose of our analyse was to verify:
Who are the 20 players that scored more goals
Who are the 20 players that assisted more times
Who has more contribution to goals
Who are the 20 players that need less time to score a goal
Who are the 20 players that need less time to assist someone to score
Top 20 players that got more yellow cards
Top 20 players that got more red cards
Top 20 players with more card combination.
I started by importing the libraries, I found that there was a lot of missing data in some tables but as this data would not interfere with my study I decided not to change anything.
After changing some data, I started by doing my analyses.
I found that Lewandowski was the player with the highest number of goals (maybe because our database did not have records of the Arab league and MLS, which certainly influenced the study).
Note( Lewandowski has a total of 46014min, Messi 43543min and Ronaldo 41150min)
Messi was the player with the most assists, followed by De Bruyne and Muller
Note( Messi has a total of 43543min, De Bruyne 35054min and Muller 34384min)
The number of minutes played certainly affects the data a lot.
Messi, Lewandowski and Ronaldo were the players who contributed the most to the goal:
Ronaldo, Messi and Mbappe are the players who need less time to score a goal:
De Bruyne, Muller and Neymar Jr. are the players who need the least time to make an assist:
Speaking of contribution per minute, Messi, Neymar and Kylian Mbappe are the players who need fewer minutes on average to contribute to goal (goal/assist):
In a second part, I studied which players had the most yellow, red and total cards.
I noticed that Danilo, Paulinho and Marcelo, all Brazilian players, saw more yellow cards.They're also top 3 of on sum of yellow and red cards.
Marcelo, Felipe and Diagne were the players who saw the most red cards.
I filtered the 3 players that contributed the most for goals and assists over the years. Lewandowski, Messi and Ronaldo were the top 3 players that contributed the most, so I wanted to draw a graph that could show how they've done it over the years. I could see that in the first half of the years, records were broken by Ronaldo and Messi but over the years Lewandowski started to take the lead.
The peaks are:
To finish my study, I spoke a little about Sporting, I saw that Sporting's stadium had the highest attendance in 2017 with an average of more than 40k people per game and a lowest attendance during Covid-19 pandemic where there were a lot of games without supporters in the stadium. Vyktor Gyokeres , Goncalo Inacio and Diomande are the players with highest value in the market at the moment.